Why a Bank's Stress Test Comes Down to the Quality of Its Data

Every figure a bank submits to its regulator rests on data it gathered somewhere else, and the regulator trusts that figure only as far as it can trust the data beneath it. When the underlying records are inconsistent or incomplete, the numbers built on them are too, and the cost surfaces long before anyone notices the gap. Poor data quality costs organizations an average of $12.9 million a year, a figure that runs higher in heavily regulated sectors where a single inaccurate report can trigger penalties. In banking, the stakes go past the balance sheet: a stress test passed or failed turns on whether the data behind it holds up to scrutiny.
AI-generated summary, reviewed by editors
Riazullah Khan is a Senior Data Engineer with more than 17 years across media, banking, and healthcare. Earlier in his career he built the data quality and governance foundation that allowed a mid-sized bank to meet Federal Reserve stress-testing requirements after a series of mergers placed it under large-bank regulation. He has since written on the discipline that work demands, including a HackerNoon article on analysis of the pillars of data governance and why they matter, which lays out how ownership, quality, and accountability keep enterprise data dependable. His position is plain: governance is not paperwork, it is what makes a number worth acting on.
When Bad Data Becomes a Compliance Problem
The financial damage from unreliable data is larger than most balance sheets admit. Companies lose an estimated 15% to 25% of revenue each year to poor data quality, through wasted effort, missed opportunities, and decisions made on the wrong inputs. For a bank, that loss carries a sharper consequence. Inaccurate data erodes revenue, and it also threatens the regulatory standing that protects customer deposits and public confidence.
When the bank Khan worked with entered a larger banking group through acquisition, it crossed the threshold that subjects institutions to the Comprehensive Capital Analysis and Review, the Federal Reserve's annual stress test introduced after the 2008 financial crisis. The bank's data was not ready for that scrutiny. Records sat in silos across credit cards, loans, and deposits, with duplicates, missing fields, and inconsistent values that made reliable reporting difficult. Khan was brought in to build a data quality foundation strong enough to satisfy a federal regulator.
"A stress test is only as honest as the data feeding it," says Riazullah Khan. "You can have the most sophisticated capital model in the world, but if the inputs are wrong, you are handing the regulator a confident answer that happens to be false."
Turning Regulation Into Rules a Small Team Can Run
Regulatory language does not arrive as code. A capital-planning requirement reads as a paragraph of intent, and someone has to translate it into specific, testable checks against millions of records. The harder problem is doing that at scale without a large team, where every new rule risks becoming another piece of fragile, hand-maintained logic. Hundreds of one-off checks turn unmanageable long before they turn comprehensive.
Khan designed a metadata-driven framework that treats each data quality rule as configuration rather than custom code. Core tables hold the rules, their categories, the feeds they apply to, and their results, so a new check can be added by defining it rather than rebuilding a pipeline. He mapped business requirements to standard quality dimensions such as completeness, accuracy, and consistency, then organized them by domain so each part of the bank owned its own rules. Reusable components kept hundreds of checks maintainable, while partitioning and indexing let the framework process terabyte-scale data inside tight reporting windows. For much of the project he carried this work alone, covering architecture, modeling, and development.
"Writing one rule is easy. The trap is the thousandth rule," Khan notes. "If adding a check means writing new code every time, the system collapses under its own weight before it ever earns the regulator's trust."
A Discipline That Went From Niche to Industry Standard
What was once a specialized obligation for the largest banks has become an industry-wide investment. The data governance market is on track to grow from $5.38 billion in 2026 to $24.07 billion by 2034, a compound annual growth rate of 20.5%, with compliance management its fastest-growing segment. Banking, financial services, and insurance lead that spending, driven by regulators who now expect institutions to show the lineage behind every reported number.
Khan built more than a set of checks. He established data stewardship roles so specific people were accountable for specific data, created a business glossary that standardized definitions across departments, and produced scorecards that gave stewards daily visibility into the health of their data. The same evaluative instinct he applies to governance he also brings as a judge at the Beta University AI Super Hackathon, where he assesses how teams turn raw data and models into systems that hold up under real conditions. Judging early-stage work, he says, reinforces how often a clever idea fails on the unglamorous question of whether its data can be trusted.
"Governance gets dismissed as bureaucracy until the day a number is wrong and no one can say why," Khan explains. "Ownership, definitions, and lineage are what let you answer that in minutes instead of weeks."
Catching the Error Before It Reaches the Filing
The dangerous failure in a reporting system is not the one that stops it. It is the number that looks correct, passes through unchallenged, and lands in a regulatory filing carrying an error no one caught. By the time the discrepancy surfaces, it has already shaped a decision or a submission, and unwinding it costs far more than catching it at the source would have.
Khan built the framework to find those errors early. After each data load, automated profiling and validation run against the defined rules, and any records that fail are flagged rather than passed downstream. A feedback loop routes non-compliant data back to the source teams for correction, then revalidates it once fixed, so the same error does not resurface. Data lineage traces every reported value back to its origin, which means that when a regulator or an executive questions a number, its full history is available rather than reconstructed from memory.
"The goal is to fail loudly and early," Khan observes. "A bad record caught at ingestion is a footnote. The same record caught in a filing is a crisis."
Why Governance Matters More as Banks Automate
As banks lean harder on automation and machine learning, the discipline behind their data grows more important, not less. Regulators now expect institutions to explain how an automated model reached a conclusion, which means tracing its inputs through the same lineage and quality controls that govern human reporting. A model trained on unverified data does not hesitate before acting on it, so the controls underneath it carry more weight than they did when a person reviewed every figure. The faster a bank moves, the more it depends on knowing its data is right.
For Khan, the lesson carries forward unchanged. He treats stewardship, validation, and lineage as the foundation that everything faster is built on, whether the consumer is a quarterly filing or a real-time model. The institutions that automate well, in his view, are the ones that treated their data as a governed asset long before regulators required it.
"You cannot automate your way out of bad data. You only automate the mistakes faster," Khan reflects. "Get the governance right and the trust takes care of itself. That holds for a stress test, and it will hold for whatever comes after it."












Click it and Unblock the Notifications