ARCAS Systems
Chapter 8

The Cost of AI Getting Things Wrong

The reality

A factory in a different region of the world ran an ERP system with autonomous AI agents that managed reordering of components. In one cycle, the agent changed a minimum order quantity from 100 units to 1,000. The change went through. The supplier produced the larger run. By the time a human noticed the discrepancy, the production had moved through cutting and finishing. The cost was paid in three layers: the direct cost of the unwanted inventory, the recovery cost of finding a buyer for the surplus, and the insurance cost the founder began paying afterwards in human review steps that should have existed before the autonomous deployment.

This is the canonical case. The same pattern shows up in smaller forms every week in 10 to 50 person UAE service businesses. An AI tool sends a quote with the wrong margin. An agent commits to an SLA the team cannot meet. An autonomous workflow categorises a complaint that should have been escalated. The headline cost is rarely the largest one. The recovery cost usually is. The insurance cost (the human-in-loop infrastructure the team should have built before going autonomous) is what would have prevented all three.

Read this if

  • The team has an AI tool that runs without human approval at every step
  • An AI tool has produced an output the business had to absorb the cost of
  • The founder cannot say what the recovery cost would be if a current AI workflow produced a wrong output today
  • The team has not budgeted for the human review time needed to catch drift before clients do
  • An autonomous workflow has been deployed on an irreversible action (money, contracts, regulator filings)
  • The cost equation between AI, automation, and humans has not been calculated for any current AI tool

What dysfunction costs

Direct cost. What the wrong action cost in money or time the moment it happened. Order produced for the wrong volume. Customer told the wrong policy. Refund issued without authority. The number is whatever the wrong action cost when it executed.

Recovery cost. What it cost to fix. Hours of senior team time. Client trust to rebuild. Supplier negotiation to absorb the loss. In service business contexts, the recovery cost is usually 3 to 10 times the direct cost. A wrong quote sent for AED 200,000 (USD 54,500) above market is not a AED 200,000 (USD 54,500) problem. It is a 30 hour senior team conversation to walk back the price without losing the relationship, plus friction on every future negotiation with that client.

Insurance cost. What you should have spent on human-in-loop approval gates, monitoring, and review cycles to prevent the wrong action in the first place. Insurance cost is paid up front and looks expensive. The founders who skipped it discover the recovery cost was the more expensive line.

Reputation cost. The hardest to quantify. A client who experienced an AI-driven failure tells three other prospects. A client who experienced a human-handled failure tells fewer. The asymmetry is real. Klarna's 2025 walk-back from autonomous AI customer service was driven by exactly this asymmetry: the cost-per-interaction had dropped, and the customer satisfaction signal had dropped harder.

What success looks like

When the cost of AI getting things wrong is named:

  • Every AI workflow has a known direct, recovery, and insurance cost in dirhams
  • Every irreversible workflow has a human-in-loop step before execution
  • A monthly review samples 10 random outputs from each running AI workflow for drift
  • The team has a written policy on what the business honours when an AI surface makes a commitment
  • The cost equation (human vs automation vs AI-with-steward) has been calculated for the top three candidate workflows
  • A pilot has been cancelled or paused at least once because the cost of getting it wrong was higher than the team had budgeted to insure against

The framework

The cost of AI getting it wrong has three layers, plus a fourth that decides which technology runs which workflow at all.

Layer 1: Direct cost

The simplest layer. What did the wrong action cost the moment it happened? An invoice generated for AED 50,000 (USD 13,615) when the correct figure was AED 35,000 (USD 9,530). A quote sent with a 12 percent margin when the standard was 22. A supplier order for 1,000 units when the requirement was 100. The direct cost is whatever the wrong number was when the action executed. At scale the same pattern produces the Zillow case from 2021, where an iBuying algorithm overpaid for homes by roughly USD 880M before the company shut the program down.

The behaviour to adopt this week: pick the AI workflow with the highest blast radius. Write down what the direct cost would be of a 5 percent error in its output, multiplied by the number of times it runs per month. That number is the direct exposure.

Layer 2: Recovery cost

The hidden layer. Recovery is what it costs to fix a wrong action. The hours of senior team time spent on the conversation. The client meeting that would not have happened. The supplier negotiation that absorbs the loss. The internal review that figures out what went wrong and what to change.

For a service business, the recovery cost is usually 3 to 10 times the direct cost. A wrong proposal sent for AED 200,000 (USD 54,500) above market is not a AED 200,000 (USD 54,500) problem. It is a 30 hour conversation to walk back the price without losing the relationship, plus the friction it adds to every future negotiation with that client.

The behaviour to adopt this week: estimate the recovery time for the same workflow. Hours of senior team time, multiplied by the loaded hourly cost (typically AED 200 to AED 500, USD 54 to USD 136 per hour for senior staff). Add it to the direct cost.

Layer 3: Insurance cost

The layer that should have been paid first. Insurance cost is the human-in-loop approval gates, the monitoring, the weekly review cycles, the steward time that catches drift before the client does. It is paid in advance and looks expensive on the budget.

For a service business, the insurance cost runs roughly 20 to 30 percent of the AI tool's monthly cost: a senior team member spending one to two hours a week reviewing outputs, plus the monitoring infrastructure (logs, sample outputs, escalation rules). For an AED 5,000 (USD 1,360) per month AI tool, the insurance cost is roughly AED 1,000 to AED 1,500 (USD 270 to USD 410) per month.

The behaviour to adopt this week: take the direct cost from Layer 1 plus the recovery cost from Layer 2. Compare to the insurance cost. The number that should have been paid is rarely larger than the number that ends up being paid when insurance is skipped.

Layer 4: The cost equation in 2026

When is AI cheaper than humans? When is the human cheaper? When is automation cheaper than both? The math depends on volume.

Volume per monthLowest-cost technology
Under 40 occurrencesHuman
40 to 200 occurrences, repeatable input → repeatable outputDeterministic automation
40 to 200 occurrences, flexible outputAI with steward (LLM or agent)
Over 200 occurrences, flexible outputAI with steward, on agent or autonomous infrastructure

The numbers shift with volume. A senior team member at AED 12,000 (USD 3,270) per month doing 160 hours of work costs roughly AED 75 (USD 20) per hour. An AI workflow plus steward might cost AED 7,000 (USD 1,910) per month all-in. For low-volume work, the human is cheaper. For high-volume flexible work, AI plus steward beats the human. Deterministic automation beats both for any workflow that does not need flexibility.

For the deeper distinctions, see AI vs Automation on determinism-vs-flexibility and Types of AI and Where They Pair on per-tier pricing.

A founder you might recognise

A founder runs a 28 person logistics support business in Jebel Ali. AED 9M (USD 2.4M) last year. The team supports import-export documentation for around 60 active SME clients. In Q1 2026 the founder rolled out an AI-assisted document classifier that read incoming customs paperwork and routed each document to the right team queue. The model was an LLM with a classification head, deployed inside the team's existing operations system.

The pilot ran without human-in-loop for the first 30 days. By day 21, the model had misclassified 14 documents, three of which had ended up in the wrong queue and missed customs deadlines on the client side. Direct cost: roughly AED 18,000 (USD 4,900) in late-filing penalties absorbed by the business. Recovery cost: 60 hours of senior operations time across two weeks rebuilding the affected client relationships and writing the post-mortem, an estimated AED 24,000 (USD 6,535) in loaded staff cost. The reputational cost with two of the three affected clients was harder to name.

The founder rebuilt the workflow with a human-in-loop approval step on every classification with a confidence score below 90 percent. Insurance cost: roughly AED 3,500 (USD 950) per month in steward time. The classifier kept its accuracy. The penalties stopped. The total of the direct and recovery costs across the first 30 days exceeded eight months of insurance cost. The founder now calls this the AI version of the old rule: build the safety in before the speed shows up.

Working through it

  1. Pick the AI workflow with the highest blast radius. Map every irreversible action it can trigger. Quotes sent, contracts signed, money moved, regulator filings, client commitments. The list is the exposure surface.

  2. Calculate the direct cost of a 5 percent error rate. Wrong outputs per month, multiplied by the average cost of one wrong output. The number is the direct exposure.

  3. Calculate the recovery cost. Hours of senior team time per wrong output, multiplied by AED 200 to AED 500 (USD 54 to USD 136) per hour, multiplied by the wrong outputs per month. The recovery number is usually 3 to 10 times the direct cost.

  4. Calculate the insurance cost. Steward time per week (typically 1 to 4 hours), multiplied by the loaded hourly cost. Plus monitoring infrastructure if the workflow needs logs and sample reviews.

  5. Compare the three. Build the insurance first. If the direct plus recovery exposure exceeds three months of insurance cost, the insurance was always the right move. Add the human-in-loop step before scaling the workflow.

Common mistakes

  • Skipping the cost calculation. A team that has not done the direct + recovery + insurance math will price the AI tool by its subscription line and ignore the rest. The recovery cost catches them up the first time something goes wrong.
  • Treating the insurance cost as overhead. Steward time is the operational difference between an AI workflow that works and one that drifts into client complaints. The cost is a feature.
  • Putting autonomous AI on irreversible workflows. The blast radius compounds with the autonomy level. An autonomous tool on an irreversible workflow can trigger a recovery cost the business cannot absorb. The MOQ-from-100-to-1000 case is real and the pattern repeats.
  • Underestimating reputation cost. A client who experienced an AI-driven failure tells more prospects than a client who experienced a human-handled failure. The asymmetry shows up in retention curves over the next year.
  • Choosing capability over reliability. A more capable AI agent can do more things, including more wrong things, faster. Capability without a stewardship layer is the most expensive AI strategy a service business can run.

Self-assessment

Y or N for each.

  1. Have you calculated the direct cost of a 5 percent error rate in your highest-blast-radius AI workflow?
  2. Have you calculated the recovery cost in hours and dirhams for the same workflow?
  3. Is there a named human-in-loop approval step before any irreversible action in any current AI workflow?
  4. Do you have a steward (with weekly time blocked) for every running agent and autonomous tool?
  5. Has at least one AI workflow been paused or downgraded because the cost equation did not work?
  6. Could you name the cost equation crossover point (the volume above which AI plus steward beats the human, and below which the human is cheaper)?
  7. Has the team written down the policy on what the business honours when a customer-facing AI surface makes a commitment?

Five or more "yes" answers means the cost reality is named and the team is operating against it. Three or four is the band where the founder has done the math but the discipline has not landed across the team. Two or fewer means the next AI failure will be priced as it lands, and the recovery cost will be the largest line in the post-mortem.