Why Restraint Is the Most Underrated Skill in AI Development

Author
Christie Pronto
Published
April 27, 2026

Why Restraint Is the Most Underrated Skill in AI Development

At some point in the last year, you probably approved an AI feature you were not fully sure about. 

Maybe the board asked about your AI roadmap and you needed an answer. 

Maybe a competitor shipped something and the silence on your end felt like falling behind. 

Maybe the development team said it was possible and the timeline was reasonable and the pressure from above made yes feel like the only defensible choice.

That feature is either live now or it is in progress. 

And there is a version of this story where it works well, where users adopt it, where the inference costs land inside what was projected, where the behavior is reliable enough that nobody on your support team is fielding complaints about what it said. 

That version exists.

There is another version where you are six months post-launch looking at adoption numbers that have not moved, or an infrastructure bill that came in higher than anything that was modeled, or a support ticket from a user who trusted the AI output in a situation where being wrong actually mattered. 

That version is more common than the announcements suggest, and the reason it does not get talked about openly is that the decision to build is hard to reverse and even harder to explain to the people who approved it.

The difference between those two versions usually comes down to whether someone asked the hard questions before the build started. 

Not about technical feasibility. About what it would actually cost when things went wrong.

What the Market Is Actually Rewarding Right Now

Klarna is the example that traveled fastest because the results were announced so publicly. In 2024 the company replaced hundreds of customer service agents with AI, published the efficiency data, and the story became a reference point for what AI could do at scale. 

By 2025 they were rehiring human agents because the AI could not handle real disputes and customers with situations that fell outside what the system was designed for.

The rehiring got less coverage than the launch. 

That asymmetry is exactly the incentive structure you are operating inside. The market rewards the announcement of AI ambition and the consequences arrive later, with less fanfare, on a balance sheet rather than a press release. 

By the time the cost lands, the original decision is long past and the team that made it has moved on.

For a company at Klarna's scale, that kind of reversal is expensive and embarrassing. For a growth-stage company, the math is worse. 

You do not have the runway to absorb a development cycle spent building something users will not adopt, or the user base large enough that one cohort's bad experience with an AI output gets diluted before it affects retention. 

The mistakes that large companies recover from are the ones that set smaller companies back by a year.

What Happens When Nobody Says No

When Air Canada's chatbot gave a passenger incorrect information about bereavement fares, the airline argued in court that the chatbot was a separate entity and therefore they bore no responsibility for what it said. 

A tribunal rejected that and held them liable. The case became a reference point for AI liability, but what it actually illustrates is a product decision that was never fully examined before launch.

Somebody deployed that system into a customer service context, which means a context where users are making real decisions based on what they are told, without fully mapping what would happen when the AI was wrong. 

In a bereavement fare situation, the user is grieving, under time pressure, and trusting the system to give them accurate information at a moment when being wrong has a real personal and financial cost. 

That is not a context where you get to discover the edge cases after launch. The cost of that discovery landed on the customer first, and then on the company in court.

Across the industry the same failure plays out every day in ways that never surface publicly but still carry real cost:

  • A growth-stage SaaS company ships an AI assistant for their users and watches engagement drop in the cohort that used it most, because two or three bad outputs early in the experience were enough to make users stop trusting it entirely
  • A content platform automates production with AI before stress-testing outputs at scale, and spends the next quarter doing damage control on the pieces that should not have been published
  • An internal operations tool gets built on a model whose inference costs were scoped at low usage, and when the team actually adopts it the bill triples in sixty days

In each of those situations the technology did what it was built to do. The failure was in the decision made before a line of code was written, by a team that either did not ask what it would cost when things went wrong or did not have someone in the room willing to make the question uncomfortable.

What Restraint Actually Looks Like in Practice

The question that separates a disciplined AI build from an anxious one is not technical. It is: what would a user have to believe about this feature for it to change their behavior? 

If the answer requires trusting something the system cannot yet reliably deliver, that gap does not close after launch. 

It widens, because users who encounter it early form opinions that are difficult to reverse.

Duolingo is a useful contrast because they have been deliberate enough about this to produce a visible pattern. 

Their AI integrations have targeted the specific places where the technology adds something the existing product genuinely could not do. The parts of the product that already worked well were left alone. 

That is not caution for its own sake. It is the recognition that a feature with real adoption is worth more than an ambitious one that users try once and abandon, and that the bar for shipping should be whether the feature is reliable enough to be trusted, not whether it is technically possible.

The companies that replaced entire content pipelines or customer-facing systems with AI before assessing output quality at scale made a different calculation. 

They discovered the answer to the reliability question in production, which meant their users discovered it first. 

The trust damage from that is harder to repair than the original system would have been to maintain.

We use AI extensively in our own processes, in how we scope, document, research, and build. That experience is what makes the line visible to us. There are places where AI accelerates the work without introducing risk to the outcome, and places where the variability in what it produces is exactly the wrong thing to introduce into a client's system. 

Knowing where that line falls, and being willing to say so clearly, is the part of the job that has nothing to do with technical capability. 

We believe that business is built on transparency and trust, and that good software is built the same way. 

An AI feature that your users do not trust is not a feature. It is a liability.

When we push back on AI scope with a client, these are the trade-offs we are putting on the table:

  • The runway going into a feature that users will not adopt is not recoverable, and at a growth stage that runway has a real cost measured in what else it could have funded
  • A user who encounters a bad AI output in a high-stakes moment and loses confidence in the product does not typically give it another chance
  • An architecture that cannot support real usage patterns is far more expensive to fix after launch than before it, because by then the system is live and the fix requires working around what is already in production

Getting that right before the build starts requires someone willing to make the conversation harder than it needs to be. 

That is the job.

Why Judgment Is the Actual Deliverable

The ability to build AI features is no longer what separates development teams. That capability is widely available and the tooling has made it faster than it has ever been. 

What remains genuinely rare is the judgment to know whether a given AI application will hold up under real conditions, with real users, at the scale the product will actually reach.

In practice, that judgment looks like:

  • Knowing which problems AI handles reliably and which it handles badly, and being willing to name that clearly when the client is excited about a direction that sits in the wrong category
  • Understanding that a feature behaving unpredictably in 5% of cases is not a 95% success story, it is a trust problem that surfaces in your worst-timed support tickets and your hardest-to-explain churn data
  • Recognizing that the inference cost model that works at a hundred users does not automatically work at ten thousand, and that discovering this post-launch is a different kind of problem than modeling it before

A development team that stays accountable for outcomes past launch builds this judgment because the feedback loop is unavoidable. 

They are the ones who hear about the inference bill, who get asked why the feature is not being used, who have to explain what happened when an output was wrong. That accountability changes how they build. 

A team that hands off at launch and moves on to the next engagement does not carry that feedback and does not develop it.

We host and maintain what we build. That means we are in the feedback loop whether we want to be or not, and that sustained accountability is exactly what makes the hard conversation before the build worth having every time. 

The firms worth working with are the ones where that accountability is structural, not incidental.

How to Know If Your Development Partner Has This Skill

The best time to find this out is before you have signed anything. Here is what to look for.

Have they ever talked a client out of an AI feature? A team with real judgment has done this and can tell you the specifics, what the feature was, what the concern was, and what happened as a result. A team that says yes to every AI ask is not protecting you. They are getting paid to build.

Can they walk through the cost model of what they are proposing beyond the build itself? Inference costs at real usage, the ongoing work of keeping model behavior calibrated, the support load that comes from edge case outputs. If they cannot speak to those numbers before the contract is signed, those numbers will surprise you after.

Do they have a clear position on where AI belongs in your specific product and where it does not? A firm without a point of view on this has not thought hard enough about your situation. You want a partner who will tell you what they would not build and why, not just what they can.

What does their process look like for validating AI behavior before real users encounter it? Beyond standard QA, the specific work of finding the edge cases before your users do, where inputs are unexpected and the stakes of a wrong answer are high. If the answer is vague, the edge cases will be your problem to discover.

The pressure to build AI into your product is real and it is not going away. 

The board question is not going to stop, the competitor announcements are not going to slow down, and the expectation that serious companies are doing something with AI is now baked into how growth-stage businesses get evaluated.

That pressure is not the problem. 

The problem is when it drives decisions that skip the questions that matter, and when the team you have hired to build is not the team willing to ask them.

The AI work that holds up is the work where someone got uncomfortable in a room before the build started. Finding that person, or that firm, is worth more than finding the team that will move the fastest.

Author
Christie Pronto
Published
April 27, 2026

Check out the BIZ/DEV podcast

Our weekly tech podcast focusing on AI, our industry, the founder's journey, and more.

biz/dev podcast
Free Strategy Session