Before the Model: Why Data Pipeline Modernization Is the Real Work of Enterprise AI
Every company wants an AI strategy. Far fewer have the data to make one work. The gap between the two has become one of the defining problems of enterprise AI: a model is only as good as the information it can access, and many organizations still cannot access their own data cleanly. Gartner expects organizations to abandon 60% of AI projects through 2026 because the data underneath them is not ready, and 63% of organizations are not confident they even have the right data practices for AI. The bottleneck is rarely the algorithm. It is everything that feeds it.
Haricharan Shivram Suresh Chandra Kumar has spent 14 years building that missing foundation. A principal data engineer who designs and scales data and AI platforms for a US health-insurance marketplace, he specializes in the pipelines, real-time streaming, and production ML and LLM operations that decide whether a model gets usable data or garbage. In a published technical paper, he proposed a metadata-driven framework for delivering data from many sources at scale, the kind of plumbing that rarely makes headlines but determines whether an AI system works in production or stalls in a pilot. In healthcare, where the data is sensitive, fragmented across systems, and bound by strict compliance rules, that foundation is harder to build and more consequential to get right.
AI-generated summary, reviewed by editors

The Bottleneck Was Never the Model
Adoption is no longer the issue. By 2025, 88% of organizations reported using AI in at least one part of the business, yet only about a third had managed to scale it beyond experiments. The pattern repeats across industries: a promising proof of concept, real excitement, then a deployment that stalls when the model meets the company's actual data, scattered across systems that do not agree with each other.
Haricharan works at the point where that stall happens. His focus is the layer between raw enterprise data and the models that consume it: ingesting data from multiple sources, reconciling formats, and delivering it quickly and reliably enough for real-time use. The hard problems are rarely glamorous. They are schema drift, late-arriving records, and the dozens of small inconsistencies that quietly poison a model's output if no one catches them upstream. None of it is visible in a demo, and all of it decides whether the system survives real traffic.
"Everyone wants to talk about the model," Haricharan says. "The work that actually decides whether it succeeds happens long before that, in how the data gets in, cleaned up, and delivered. Get that wrong and the smartest model in the world gives you nonsense."
The Hidden Cost of Bad Data
The price of getting it wrong is measurable. Poor data quality costs the average organization $12.9 million a year, through bad decisions, wasted engineering time, and errors that compound as they move downstream. In an AI system, those errors do not stay in a report. They get amplified, because a model prompted on flawed data produces flawed output at scale, fast, and with a confidence that makes the mistakes harder to spot.
Haricharan designs pipelines to stop that at the source. His metadata-driven approach treats the rules for how data should look as a managed asset, so new sources can be added without hand-coding each one and so problems are caught where they enter rather than after they spread. The aim is a delivery system that scales across many sources without multiplying the ways it can fail, which is the difference between a pipeline that serves a single use case and one that serves the whole company.
"Bad data is more dangerous in AI than it was in reporting," he notes. "A wrong number in a dashboard is a wrong number. A wrong input to a model becomes a wrong answer that sounds right. You have to catch it upstream, because downstream it is invisible."
Turning Conversations Into Data
Most enterprise data was never built for analysis. Industry estimates suggest that roughly 80% of enterprise data is unstructured, sitting in emails, documents, recordings, and call transcripts that traditional systems cannot query. For a business that talks to customers all day, the most valuable signal often lives in those conversations, locked in audio and text that no dashboard can read.
This is where Haricharan has pushed recent work, building pipelines that turn unstructured content into structured, queryable data. He has designed and documented a production-grade multi-agent LLM architecture for an AI-powered call transcription analytics platform, leveraging language models as structured extraction engines that transform documents and transcripts into clean, actionable records rather than conversational chatbots. Applied to call-center data, that approach converts thousands of hours of conversation into structured information that teams can actually analyze.
"A call transcript is a goldmine that used to be unreadable at scale," Haricharan reflects. "The shift is using models to structure it: pull out what happened, what the customer needed, what went wrong, and put it in a form you can query like any other table."
When the Pipeline Is the Product
As AI moves into production, the pipeline stops being a backstage utility and becomes the product itself. The same Gartner research found that 63% of organizations either lack or are unsure of the data practices AI needs, which means the teams that get the pipeline right hold a real advantage. Reliability, lineage, and the freedom to add new sources without breaking the system become competitive features, not housekeeping.
Haricharan treats the pipeline with that seriousness, building in observability and monitoring so failures surface before they reach a model or a customer. His standing in the field reflects it: he has served as a judge for the Builders of Tomorrow AI Super Hackathon, evaluating how other engineers design AI systems under real constraints. That outside perspective, gained from evaluating many teams' approaches, informs how he thinks about building data platforms that are durable rather than fragile.
"In production, the pipeline stops being a backstage utility and becomes a core product capability. It is the product," he says. "If it goes down or drifts, the AI on top of it is worthless. So you build it to be watched, to recover, and to tell you when something is wrong."
The Foundation Decade
The market is catching up to what engineers like Haricharan have long known. Spending on data integration is forecast to reach $33 billion by 2030, driven largely by the need to make unstructured, multi-source data ready for AI. The investment is flowing toward the layer that was treated as plumbing for years, as companies realize that AI ambitions rest on data foundations they neglected. What was once treated as a cost center is now core infrastructure.
For Haricharan, that is the shape of the next decade: the work moves from building models to building the systems that make models trustworthy at scale. Real-time delivery, structured extraction from unstructured content, and pipelines that hold up under production load are no longer support functions. They are where competitive advantage in AI is won or lost, and the engineers who can build them will decide which AI strategies survive contact with reality.
"The companies that win with AI will not be the ones with the fanciest models," Haricharan says. "They will be the ones who did the unglamorous work of getting their data right first. The model is the last mile. The foundation is everything before it."












Click it and Unblock the Notifications