ARCAS Systems
15 min readJune 8, 2024

Holding the Line on Quality: Core Work

Working page for Holding the Line on Quality.

Why this matters

You did the work in the earlier chapters of this part. You ran the readiness check, picked use cases that mattered, brought your team along, and measured the impact. The tools are now real inside your business. People are using them. Output is faster. The before-and-after numbers look good.

Six months later, a quieter problem starts to show up. A client points out something in a deliverable that your team would have caught a year ago. A site report has a number that is plausible but wrong. The tone of your client emails has flattened. A team member tells you they have stopped reading the AI output carefully because "it is usually fine."

This is quality drift, and it is the most predictable failure mode in any business that adopted AI without designing the second part of the system. The first part of the system was adoption. The second part is stewardship. Most teams build the first and skip the second.

The diagnosis engine flags this through the Skills, Behaviour, and Power audits. When scores show declining output quality alongside strong tool adoption, the leak is here. The Five Levels model treats it as systems leakage compounded with people leakage. The system is doing more of the work. The people are doing less of the checking. The result is silent erosion that nobody owns.

A founder you might recognise

Yusuf runs a 38-person property management company in Jumeirah Lake Towers. He spent most of 2024 doing the AI rollout right. Started with one pilot, built to four. Trained his team. Bought the team workspace. By Q1 2025 his client report turnaround had dropped from three days to four hours. His tenant communications were faster. His operations team was getting home at a sensible hour.

Then a major commercial client called him on a Friday afternoon. They were renewing the contract and wanted to talk about a number in the last quarterly report. The number was wrong. Not by much. Just enough to make the trend look positive when the actual trend was flat. The client had run their own check. Yusuf had not. Nobody on his team had. The AI had pulled the wrong column from the data export and the report had gone out, formatted beautifully, with a number that was not true.

The client renewed. They also told Yusuf, politely, that they would now be checking every report. Yusuf went back through the last ninety days of reports. Three more had errors of the same shape. Plausible. Confident. Wrong. None of them had been caught because nobody on the team had been reading the reports the way they used to. The tool was usually right, and "usually" had become the standard without anyone deciding it should.

Yusuf did not have an AI problem. He had a stewardship gap. Nobody in his business had been given the job of holding the line on what quality looked like once the tools started doing the first draft.

The four shapes drift takes

Drift does not look the same in every business. It almost always shows up in one of four patterns. Run your own work against this list.

Plausible-but-wrong. The AI produces output that looks correct, formatted properly, with the right shape. The content is wrong in a specific, hard-to-catch way. A wrong number in a report. A misattributed quote. A client name from a different account. The format hides the error. This is the most dangerous pattern because it can run for months before a client catches it.

Voice flattening. Every email, every proposal, every internal note starts to sound the same. The variance that used to come from different team members thinking differently has been smoothed out. Your business is now writing in a single voice that belongs to a tool, not to your company. Clients notice this before you do.

Edge-case neglect. The tool handles 95 percent of cases well. The 5 percent that do not fit the pattern start getting handled by the same tool, badly, because nobody on the team is paying attention to which cases need a human. The unusual client. The complicated brief. The exception that used to get escalated. All of them now get the standard output, and the standard output is wrong for them.

Standard slippage. The work that goes out is acceptable. It used to be excellent. The team has quietly recalibrated to "the AI did most of it, this is fine" instead of "the AI did most of it, now make it actually good." Each individual piece of work is not a problem. The trend over twelve months is.

If you recognise more than one of these in your business, the work in this chapter is overdue.

The quality gates

A quality gate is a named human checkpoint between AI output and a destination that matters. Every workflow you have automated needs at least one. Most have zero, which is why drift compounds.

Design quality gates around three rules.

Every client-facing AI output passes through one. No exceptions. A report goes to a client only after a named team member has read it line by line. A proposal lands with a prospect only after the account owner has rewritten the parts that matter. The gate is not a click-through approval. It is a human reading the work with the attention they would give it if they had written it themselves.

Every internal data output that drives a decision passes through one. A revenue analysis that informs a hiring decision. A churn report that informs a pricing change. A site cost summary that informs a quote. If the number changes the decision, a human checks the number.

Edge cases route around the tool. Define what counts as an edge case in your business. The new client without a brief. The complicated commercial deal. The complaint that needs a personal response. These do not go through the AI workflow at all. They go to a named senior person who handles them by hand.

The point of a quality gate is not to slow the work down. It is to put a human between the tool and the consequence. The work still moves fast. It moves fast through known territory and slows down at the points where being wrong actually costs the business something.

Naming the steward

Adoption fails when no one owns it. Stewardship fails the same way. Every workflow you have automated needs a named person whose job is the quality of that workflow. Not "the team." A named person.

The steward has three responsibilities.

Set the standard. Write down what good looks like for this workflow. A client report has these five required elements, this tone, these checks. A site inspection note covers these five points and is signed off within 24 hours. The standard is a sentence, not a manual.

Run the weekly read. Pull a sample of the last week's output. Read it. Mark drift. Share what you found with the team in the next standup. Five minutes a week. The pattern is what matters, not any single output.

Hold the line in conversations. When a team member sends an AI draft that is below the standard, the steward says so. Not punitively. Specifically. "This report met the format check. The narrative section is generic. Rewrite the narrative the way you would have a year ago." The steward is the person who will not let the team's standard for "good enough" slip without a conversation.

In a 30-person business you probably need two or three stewards. The senior person closest to the workflow. Not the founder, unless you want to be running quality gates yourself for the next five years.

The weekly drift check

A short ritual that catches drift in the week it happens, not the quarter. Run it every Friday with the stewards, takes 20 minutes.

For each automated workflow, three questions.

What did we send out this week that we would not have sent out a year ago? Hard question. Honest answer. The answer is the drift.

What did a client, partner, or team member flag that the tool should have caught? External signals are gold. They are also rare, because most clients do not flag drift, they just quietly downgrade their opinion of you.

Where did we let "usually right" do the job that "actually right" should have done? This is the standard slippage check. Did the team accept output because the tool is usually fine, when on this specific item, fine was not the standard you needed.

The answers go in a one-page log. Reviewed monthly. The log itself becomes a leading indicator. If the log is empty for two months, either drift has stopped or your stewards have stopped looking. Both are worth knowing.

Building taste back into the team

The deeper work is not the gates or the checks. Those are the system. The work underneath the system is whether your team still has taste, and whether they are still using it.

Taste is the ability to read a piece of work and know whether it is good. It is built by reading good work, producing good work, getting feedback on bad work, and doing this for years. AI tools accelerate output. They do not build taste. If anything, heavy AI use without deliberate taste-building flattens the team's standard to the level of the tool.

Three practices keep taste alive.

The reading hour. Once a week, the senior team reads one piece of excellent work in your field. Not internal. External. A great client proposal from someone else. A great site report from a competitor. A great client email shared in confidence. They mark what makes it good. Twenty minutes of reading, twenty minutes of conversation. This builds the team's mental library of what excellent looks like.

The before-and-after review. Pull a piece of work the team produced this month. Show the AI first draft and the final version. Walk through what got changed and why. The conversation teaches the team what the human contribution actually was. If the answer is "nothing meaningful," that is the data. The team needs the practice of adding something.

The blind read. Once a quarter, take a piece of client-facing work the team produced and read it as if you were the client receiving it. No team explanation. No context. Just the work. What does it say. What does it miss. Would you renew based on this. The discomfort of reading your own work cold is the discomfort that protects the standard.

These practices do not scale to every workflow. They scale to the work that matters most. Pick the three or four outputs that define your business and protect the taste in those.

The honest conversation with the team

At some point the stewardship work means saying something to the team that they may not love hearing. The standard has slipped. We are going to spend more time on quality, even if it slows the work down. Here is what is changing and why.

This conversation works when it is honest about three things.

The slip was not their fault. It was a predictable result of a workflow change you made without designing the second half. The team was doing what the system rewarded. The system is changing.

The standard is non-negotiable. There is a quality bar this business holds. The reason clients pay you what they pay you is that bar. The bar is not for sale, even when the tools make it easier to drop.

The team's job changes shape. Less time producing first drafts. More time on the parts that matter most. Reading carefully, catching edge cases, holding the standard. The senior people will own quality gates. The tools handle the volume. The humans handle the consequence.

Do not have this conversation in an all-hands. Have it with your senior team first, individually. Then bring the structure to the wider team in a working session. The order matters. The senior people need to be the ones standing for the standard, not the founder talking past them.

Common mistakes

  1. Treating drift as an AI problem. It is a stewardship problem. The tool will keep doing what it does. The question is who in your business is paid to hold the line. Without a named human, drift is the default outcome.

  2. Assuming adoption metrics tell you anything about quality. Heavy adoption with no stewardship produces faster bad work. The dashboards say green. The clients quietly downgrade their renewals. Watch quality independently of adoption.

  3. Running the weekly drift check once and dropping it. Drift is a continuous force. Stewardship is a continuous practice. A one-time audit catches the current drift and misses everything that arrives over the next twelve months.

  4. Letting the founder be the only steward. The founder cannot be the quality gate for thirty workflows. The founder builds the system that makes other people the gates. If every drift conversation routes back to you, the system is not working.

  5. Confusing format with quality. AI is excellent at format. The output looks correct. Looks correct is not the same as is correct. Quality gates check the substance, not the shape.

When to move on

Move into Part 6 when three things are true. Every automated workflow in your business has a named steward in writing. You have run the weekly drift check for at least four weeks and made one specific change based on what it surfaced. You have had the honest conversation with your senior team and they can each name the standard for the workflows they own.

If any of those is missing, the work in this chapter is not done. The tools will keep working. The standard will keep slipping until you address it.

Where to focus by team size

  • 10 to 19 people: You probably have one or two automated workflows. Name a steward for each. Run the weekly drift check yourself for the first month before handing it off.
  • 20 to 34 people: Build the full stewardship layer. Two to four named stewards. A weekly drift check that runs without you. A monthly review of the log.
  • 35 to 50 people: Push stewardship down a layer. Your senior team owns the gates. Your job is to run the monthly review with them and hold the line on the standard at that level.

Working prompts

People prompts

  • Who on your team has the strongest taste for what good work looks like in your business? Are they paid to use that taste, or is it being underused?
  • Which team member has stopped reading AI output carefully because "it is usually fine"? What conversation is overdue?
  • If your most senior person left tomorrow, would the standard go with them, or have they built it into the team?

System prompts

  • For each workflow you have automated in the last twelve months, can you name the human checkpoint between AI output and the client?
  • What is the format your team uses to flag drift when they spot it, and how often does anyone actually use it?
  • Where in your operating rhythm does quality get reviewed independently of adoption metrics?

AI prompts

  • Which prompts in regular use produce plausible-but-wrong output that the team has learnt to accept rather than fix?
  • Which workflows are now fully automated end to end with no human between the tool and the consequence? Are you comfortable with that?
  • How would you know within a week, not a quarter, that your AI output quality had dropped by 20 percent?

Founder exercise

Set aside 90 minutes. You will need a list of every AI-assisted workflow in your business and a sample of the last week's output from each.

Part A: The drift read (40 minutes)

  1. List every automated workflow in your business. The ones you remember and the ones the team built without telling you.
  2. For each, pull five to ten recent outputs. Read them line by line.
  3. Mark every output that has any of the four drift shapes: plausible-but-wrong, voice flattening, edge-case neglect, standard slippage.
  4. Count the marks. The count is your drift baseline.

Part B: The stewardship map (30 minutes)

  1. For each workflow, name a steward. The person who will own quality for that workflow from this week onwards.
  2. For each steward, write the standard for their workflow in one to three sentences. What good looks like, in plain language.
  3. Schedule the first weekly drift check. Block 20 minutes on every Friday for the next eight weeks.

Part C: The senior team conversation (20 minutes to plan)

  1. Draft what you will say to your senior team about the drift you found and the stewardship layer you are building.
  2. Be honest about your part. The system was incomplete. You are completing it.
  3. Schedule the conversations within the next ten days. Senior team first, individually. Then the working session with the wider team.

ARCAS lens

The Five Levels model treats systems leakage as the second-most-expensive layer to repair, after people. AI without stewardship is a systems leak with a people leak underneath it: the tools are doing more, the humans are doing less of the checking, and quality erodes silently in the gap. The diagnosis engine catches this in the Skills audit (rising adoption alongside falling output quality) and the Behaviour audit (a team that has recalibrated to "usually right is fine").

People then Systems then AI is the order for adoption. Stewardship runs in the opposite order. The standard is set by people, held by a system, and applied to AI output. If you skip the people part of stewardship and try to enforce quality with a tool, you will end up with two tools watching each other while the standard quietly slips between them.

Holding the line is not a one-time decision. It is a practice that lives in your operating rhythm forever, the same way safety, ethics, or financial discipline does. The work is to make it visible, named, and continuous, so the team that comes after you knows the standard is real because it is held in front of them every week.


Start now: quick self-assessment

Rate each statement from 1 (never true) to 5 (always true).

StatementYour score
Every AI-assisted workflow in my business has a named human steward in writing
The last ten client-facing AI outputs were read line by line by a senior team member before going out
I run a weekly drift check across my automated workflows, with a one-page log
My senior team can name the quality standard for the workflows they own, in plain language
Edge cases route around the AI workflow to a named senior person, not through the standard tool
I have had an honest conversation with my team about quality drift in the last 90 days

Score 24 or above: Stewardship is real in your business. Keep the rhythm.

Score 15 to 23: The gates exist in some places and not others. The drift is sitting in the gaps. Run the founder exercise this week.

Score below 15: The work in this chapter is overdue. Pick one client-facing workflow, name a steward, and start with that one before the next client catches the drift you have not seen yet.