
Every enterprise team that wired a large language model into its own data in 2025 ran into the same quiet problem.
The model could read the database. It could generate SQL. It could summarize tables.
Yet when leaders asked questions that mattered to the business, the answers drifted. Not wildly. Just enough to be dangerous.
That drift did not come from model error. It came from something far more basic. The system did not know what its own data meant.
Modern companies have spent years optimizing how data is stored, moved, and queried. Very few have ever been forced to agree on what that data actually represents.
AI made that invisible gap suddenly visible.
When a machine starts answering business questions in natural language, it inherits every unresolved definition, every quiet assumption, and every internal disagreement baked into the data.
That is where things begin to break.
The result is not just wrong answers. It is a loss of confidence in the system itself.
Executives will not accept numbers that cannot be explained. Operations teams refuse to make decisions from answers they cannot reproduce.
Data teams get pulled back into ticket queues to validate what the AI says.
The interface feels new, but the foundation underneath is still fragile.
On paper, enterprise data looks structured. Tables. Schemas. Clean joins.
In practice, what makes that data useful lives outside the database. It lives in how teams interpret it.
That gap was mostly invisible when humans were the ones writing queries. AI made it obvious.
Databricks ran into that gap when it tried to push Unity Catalog deeper into its AI tooling in 2025. Teams wanted LLMs to query lakehouses directly. What they quickly learned was that what works for analytics does not work when a machine has to understand what it is reading. Tables were optimized for storage, not interpretation.
Column names made sense to the teams who built them, but not to a system that has never been in the room.
An AI asked to calculate churn must know which customers count, which cancellations matter, and which billing states represent exits versus pauses. None of that lives in SQL. It lives in a messy mix of business rules, half documented logic, and what people on the team just know.
When the model issues a query without those rules, it will still return a number. The problem is that number does not line up with how the business actually thinks about churn.
This is why hallucinations appear in enterprise AI.
The model is not inventing data. It is inventing interpretation. It is doing what humans have always done when context is missing. It fills in the blanks.
Without a way to store those interpretations, every answer becomes a guess the business cannot defend.
Once teams started seeing how easily AI could misread their own data, the focus began to shift.
The question stopped being what the model could do and started being what the system around it actually meant.
By mid 2025, Snowflake, Databricks, and OpenAI all converged on the same idea from different directions.
They stopped treating prompts as text and started treating them as interfaces. Structured outputs, tool calling, and semantic models forced teams to define what a question could mean before it was ever asked.
What mattered was not who shipped the feature. It was that a new layer had to exist.
Snowflake’s semantic layer became the clearest example of that shift. It held mappings between business terms and physical tables. When someone asked about revenue, the layer decided which tables and which columns represented that concept.
That decision was no longer buried in a dashboard. It lived in the system itself.
Everything else followed from that same idea. Permissions, lineage, and response formats all moved into that layer because that is where meaning had to live if AI was going to be trusted.
Dashboards were built for a very different kind of question. You decided what mattered, built a chart, and reviewed it every week.
That worked when the business moved slowly and the questions stayed the same.
Business intelligence tools were built for a world where questions were fixed. You decided what mattered, built a chart, and reviewed it every week. AI flipped that model. People started asking new questions on the fly.
They expected the system to understand them.
Tableau and Power BI dashboards could not adapt to that. They encoded assumptions about how data should be grouped and summarized.
When users asked something slightly different, they fell back to exporting data or opening tickets. Dashboards still worked, they just were not built for the way people started asking questions.
Shopify’s Sidekick did not replace dashboards by drawing better charts. It replaced them by moving the logic of interpretation into the system itself. When a merchant asked why sales dropped, Sidekick had to understand promotions, refunds, inventory, and seasonality. That understanding did not live in a visual.
It lived in the same kind of context layer the rest of the industry was slowly being forced to build.
Once teams started trying to use AI for real decisions, it became clear that the problem was not the model but the data and rules it was being allowed to see.
By early 2026, the companies that were still running AI against raw tables had quietly started pulling back. They did not announce failures. They just stopped talking about the assistants they had shown off months earlier. The ones that kept going did something different.
They built a layer between the model and the data that treated meaning as a first class asset.
That shift shows up most clearly when you look at how financial systems handled it. When Stripe expanded its LLM driven revenue analytics for finance teams, it did not allow the model to touch ledgers directly. Every query passed through a strongly typed financial schema that defined what a charge, a refund, a dispute, and a payout meant in Stripe’s business language. The model never saw a raw table.
It saw concepts that were already reconciled across systems. That is what made the answers usable without someone having to step in and reconcile them by hand.
That layer also enforced permissions. A support agent could ask about a customer’s last invoice. They could not ask about the company’s cash position. The same model served both.
The system decided what each role was allowed to know, instead of leaving that up to the model.
On the surface, these tools look powerful. They connect to your database, pull your schemas, and start answering questions right away.
That speed is exactly what makes them risky, because it skips the one step that keeps teams aligned: agreeing on what the data actually means.
Generic AI tools skip this work. They connect to your database, pull schemas, and let the model guess. That looks powerful until someone asks a question that crosses departments. In logistics systems, late shipments, canceled orders, and partial deliveries often live in different tables.
An AI that does not know which of those states matters to operations will combine them and produce a number that no team recognizes. One group calls it late, another calls it an exception, and a third quietly drops it because it was “close enough.”
The same thing happens in finance once money moves through more than one state. Charges, refunds, disputes, credits, payouts, and write offs all live in motion.
When a tool treats those as simple totals, it will confidently produce revenue numbers that finance cannot reconcile.
Project driven teams feel it too. In construction and field ops, “budget” might mean contract value, approved change orders, committed cost, or cash spent.
If the tool grabs the wrong one, the answer looks clean but no one can actually use it.
Healthcare systems learned this the hard way in 2025 when compliance teams blocked LLMs from touching patient data without full audit trails. A model that cannot explain which records it accessed becomes a liability. Without a context layer that logs every query, masks sensitive fields, and enforces policy, AI cannot be trusted in regulated environments.
The failure mode is quiet. Teams stop asking the AI hard questions. They only use it for summaries and drafts.
The moment it touches numbers that drive decisions, humans step back in.
The teams that get the most out of AI tend to approach it the same way they approach the rest of their core systems.
The teams that succeed with AI behave differently. They treat their semantic models as products. They version them. They test them. They know that changing how revenue is defined will change every answer the AI gives.
At Databricks, Unity Catalog became the control plane for AI because it was the only place where data, permissions, and meaning could be managed together. Engineers were no longer just building pipelines.
They were shaping how the business actually understands its own data.
OpenAI’s enterprise customers took the same approach with structured outputs. By forcing models to return answers that matched predefined schemas, they eliminated entire classes of ambiguity.
The model could only respond in ways the business had already agreed made sense.
The interface no longer defines the software. The context does. Whoever owns how data is interpreted owns how the business runs. That is why semantic layers, embeddings, and permission models became the new core of AI systems.
We believe that business is built on transparency and trust. We believe that good software is built the same way.
When meaning is clearly defined and carried through the system, people can actually use the answers they get.
When it is not, even the best model ends up producing numbers no one feels comfortable acting on.
The companies that understand this will not chase the next model. They will spend their time making sure the data those models touch reflects how their business really works.
That is the layer that makes AI something teams can rely on.

Every enterprise team that wired a large language model into its own data in 2025 ran into the same quiet problem.
The model could read the database. It could generate SQL. It could summarize tables.
Yet when leaders asked questions that mattered to the business, the answers drifted. Not wildly. Just enough to be dangerous.
That drift did not come from model error. It came from something far more basic. The system did not know what its own data meant.
Modern companies have spent years optimizing how data is stored, moved, and queried. Very few have ever been forced to agree on what that data actually represents.
AI made that invisible gap suddenly visible.
When a machine starts answering business questions in natural language, it inherits every unresolved definition, every quiet assumption, and every internal disagreement baked into the data.
That is where things begin to break.
The result is not just wrong answers. It is a loss of confidence in the system itself.
Executives will not accept numbers that cannot be explained. Operations teams refuse to make decisions from answers they cannot reproduce.
Data teams get pulled back into ticket queues to validate what the AI says.
The interface feels new, but the foundation underneath is still fragile.
On paper, enterprise data looks structured. Tables. Schemas. Clean joins.
In practice, what makes that data useful lives outside the database. It lives in how teams interpret it.
That gap was mostly invisible when humans were the ones writing queries. AI made it obvious.
Databricks ran into that gap when it tried to push Unity Catalog deeper into its AI tooling in 2025. Teams wanted LLMs to query lakehouses directly. What they quickly learned was that what works for analytics does not work when a machine has to understand what it is reading. Tables were optimized for storage, not interpretation.
Column names made sense to the teams who built them, but not to a system that has never been in the room.
An AI asked to calculate churn must know which customers count, which cancellations matter, and which billing states represent exits versus pauses. None of that lives in SQL. It lives in a messy mix of business rules, half documented logic, and what people on the team just know.
When the model issues a query without those rules, it will still return a number. The problem is that number does not line up with how the business actually thinks about churn.
This is why hallucinations appear in enterprise AI.
The model is not inventing data. It is inventing interpretation. It is doing what humans have always done when context is missing. It fills in the blanks.
Without a way to store those interpretations, every answer becomes a guess the business cannot defend.
Once teams started seeing how easily AI could misread their own data, the focus began to shift.
The question stopped being what the model could do and started being what the system around it actually meant.
By mid 2025, Snowflake, Databricks, and OpenAI all converged on the same idea from different directions.
They stopped treating prompts as text and started treating them as interfaces. Structured outputs, tool calling, and semantic models forced teams to define what a question could mean before it was ever asked.
What mattered was not who shipped the feature. It was that a new layer had to exist.
Snowflake’s semantic layer became the clearest example of that shift. It held mappings between business terms and physical tables. When someone asked about revenue, the layer decided which tables and which columns represented that concept.
That decision was no longer buried in a dashboard. It lived in the system itself.
Everything else followed from that same idea. Permissions, lineage, and response formats all moved into that layer because that is where meaning had to live if AI was going to be trusted.
Dashboards were built for a very different kind of question. You decided what mattered, built a chart, and reviewed it every week.
That worked when the business moved slowly and the questions stayed the same.
Business intelligence tools were built for a world where questions were fixed. You decided what mattered, built a chart, and reviewed it every week. AI flipped that model. People started asking new questions on the fly.
They expected the system to understand them.
Tableau and Power BI dashboards could not adapt to that. They encoded assumptions about how data should be grouped and summarized.
When users asked something slightly different, they fell back to exporting data or opening tickets. Dashboards still worked, they just were not built for the way people started asking questions.
Shopify’s Sidekick did not replace dashboards by drawing better charts. It replaced them by moving the logic of interpretation into the system itself. When a merchant asked why sales dropped, Sidekick had to understand promotions, refunds, inventory, and seasonality. That understanding did not live in a visual.
It lived in the same kind of context layer the rest of the industry was slowly being forced to build.
Once teams started trying to use AI for real decisions, it became clear that the problem was not the model but the data and rules it was being allowed to see.
By early 2026, the companies that were still running AI against raw tables had quietly started pulling back. They did not announce failures. They just stopped talking about the assistants they had shown off months earlier. The ones that kept going did something different.
They built a layer between the model and the data that treated meaning as a first class asset.
That shift shows up most clearly when you look at how financial systems handled it. When Stripe expanded its LLM driven revenue analytics for finance teams, it did not allow the model to touch ledgers directly. Every query passed through a strongly typed financial schema that defined what a charge, a refund, a dispute, and a payout meant in Stripe’s business language. The model never saw a raw table.
It saw concepts that were already reconciled across systems. That is what made the answers usable without someone having to step in and reconcile them by hand.
That layer also enforced permissions. A support agent could ask about a customer’s last invoice. They could not ask about the company’s cash position. The same model served both.
The system decided what each role was allowed to know, instead of leaving that up to the model.
On the surface, these tools look powerful. They connect to your database, pull your schemas, and start answering questions right away.
That speed is exactly what makes them risky, because it skips the one step that keeps teams aligned: agreeing on what the data actually means.
Generic AI tools skip this work. They connect to your database, pull schemas, and let the model guess. That looks powerful until someone asks a question that crosses departments. In logistics systems, late shipments, canceled orders, and partial deliveries often live in different tables.
An AI that does not know which of those states matters to operations will combine them and produce a number that no team recognizes. One group calls it late, another calls it an exception, and a third quietly drops it because it was “close enough.”
The same thing happens in finance once money moves through more than one state. Charges, refunds, disputes, credits, payouts, and write offs all live in motion.
When a tool treats those as simple totals, it will confidently produce revenue numbers that finance cannot reconcile.
Project driven teams feel it too. In construction and field ops, “budget” might mean contract value, approved change orders, committed cost, or cash spent.
If the tool grabs the wrong one, the answer looks clean but no one can actually use it.
Healthcare systems learned this the hard way in 2025 when compliance teams blocked LLMs from touching patient data without full audit trails. A model that cannot explain which records it accessed becomes a liability. Without a context layer that logs every query, masks sensitive fields, and enforces policy, AI cannot be trusted in regulated environments.
The failure mode is quiet. Teams stop asking the AI hard questions. They only use it for summaries and drafts.
The moment it touches numbers that drive decisions, humans step back in.
The teams that get the most out of AI tend to approach it the same way they approach the rest of their core systems.
The teams that succeed with AI behave differently. They treat their semantic models as products. They version them. They test them. They know that changing how revenue is defined will change every answer the AI gives.
At Databricks, Unity Catalog became the control plane for AI because it was the only place where data, permissions, and meaning could be managed together. Engineers were no longer just building pipelines.
They were shaping how the business actually understands its own data.
OpenAI’s enterprise customers took the same approach with structured outputs. By forcing models to return answers that matched predefined schemas, they eliminated entire classes of ambiguity.
The model could only respond in ways the business had already agreed made sense.
The interface no longer defines the software. The context does. Whoever owns how data is interpreted owns how the business runs. That is why semantic layers, embeddings, and permission models became the new core of AI systems.
We believe that business is built on transparency and trust. We believe that good software is built the same way.
When meaning is clearly defined and carried through the system, people can actually use the answers they get.
When it is not, even the best model ends up producing numbers no one feels comfortable acting on.
The companies that understand this will not chase the next model. They will spend their time making sure the data those models touch reflects how their business really works.
That is the layer that makes AI something teams can rely on.