ARCAS Systems
11 min readFebruary 23, 2025

Measuring AI Impact: Core Work

Working page for Measuring AI Impact.

Why this matters

Founders typically buy AI tools the way they buy gym memberships. They sign up, use them for two weeks, then keep paying because cancelling feels like giving up. The subscription keeps running. The team half-uses it. Nobody checks whether anything actually changed.

Saving 150 hours a month means nothing if revenue stays the same. Time saved is an activity metric. Revenue per person is an impact metric. If your team uses AI to write proposals faster but your close rate stays flat, you have not gained anything. You have just moved the bottleneck.

This chapter gives you a measurement system that connects AI spending directly to revenue and cost outcomes. It maps to the Revenue Model and Conversion audits in the ARCAS diagnosis, and to cost leakage in the Five Levels model. If an AI tool does not register in one of those two places, it is not worth measuring.

A founder you might recognise

Last year, the founder of a 35 person commercial pest control firm in Sharjah subscribed to four AI tools after the AI Readiness chapter: an AI proposal writer (AED 1,200, USD 327 per month), a scheduling assistant (AED 800, USD 218 per month), a WhatsApp chatbot for client queries (AED 500, USD 136 per month), and a document summariser (AED 600, USD 163 per month). Total: AED 3,100 (USD 844) per month, AED 37,200 (USD 10,128) per year.

Three months in, he felt good about the tools. His team said proposals were "easier." The chatbot answered basic questions. But when his operations manager pulled the numbers, nothing had moved. Revenue per employee was the same. The proposal win rate had not changed. Client response time improved by 20 minutes on average, but no client had mentioned it or increased their contract size.

He was spending AED 37,200 (USD 10,128) per year on tools that made work feel different but did not make the business perform differently. That is the gap this chapter closes.

The before/after measurement framework

You cannot measure impact without a baseline. Before you turn on any AI tool, write down three numbers:

  1. Revenue per employee per month. Total monthly revenue divided by headcount. This is your primary number.
  2. Cost to deliver per job or project. What you spend in labour and materials to complete one unit of work.
  3. Conversion rate. Proposals sent versus proposals won, or leads received versus jobs booked.

Record these in a simple Excel sheet. Date-stamp them. These are your "before" numbers.

After 60 days with the AI tool running, pull the same three numbers. If none of them moved, the tool is not working. It might be saving time, but saved time that does not convert to revenue or reduced cost is just comfort.

What to measure and what to ignore

Impact metrics (these matter):

  • Revenue per employee per month
  • Gross margin per project
  • Proposal-to-close ratio
  • Customer acquisition cost
  • Repeat purchase rate

Activity metrics (these do not matter on their own):

  • Hours saved per week
  • Number of AI-generated proposals
  • Messages handled by chatbot
  • Documents summarised

Activity metrics are inputs. Impact metrics are outputs. Track both, but make decisions on outputs only.

AI ROI calculation for service businesses

The formula is simple. The discipline to use it is not.

Monthly AI cost = subscription fee + (hours your team spends learning and maintaining the tool x their hourly rate)

Founders typically forget the second part. When the team spent 6 hours per month managing the chatbot and 4 hours per month editing AI proposals, that was 10 hours at roughly AED 75 (USD 20) per hour. That is AED 750 (USD 204) in hidden cost on top of AED 3,100 (USD 844) in subscriptions. The real monthly cost was AED 3,850 (USD 1,048).

Monthly AI gain = increase in revenue attributable to the tool + decrease in cost attributable to the tool

If the proposal tool helped close two extra contracts worth AED 15,000 (USD 4,085) each, the gain is AED 30,000 (USD 8,170). The ROI is clear. But if nothing changed, the gain is zero and the tool costs AED 3,850 (USD 1,048) per month.

ROI formula: (Monthly gain - Monthly cost) / Monthly cost x 100

The chatbot: AED 0 gain minus AED 875 (USD 238) cost (AED 500 (USD 136) subscription + AED 375 (USD 102) maintenance) = negative AED 875 (USD 238). That is a -100% ROI. Kill it.

The proposal writer, after the operations manager trained the team properly and they started using it with their actual win/loss data: AED 30,000 (USD 8,168) gain minus AED 1,575 (USD 429) cost = AED 28,425 (USD 7,740) net gain. That is worth keeping.

When to kill an AI experiment

Set these rules before you start, not after you have spent three months hoping it will work.

The 60-day rule. Every AI tool gets 60 days. Not 90. Not "let's give it one more month." Sixty days with a specific success metric written down on day one.

The success metric must be an impact metric. "The team likes it" is not a success metric. "Proposal win rate increases from 22% to 28%" is a success metric. "Revenue per employee increases by AED 1,500 (USD 408)/month" is a success metric.

The kill criteria:

  1. After 60 days, the impact metric has not moved. Kill it.
  2. The team has stopped using it without being told to. Kill it.
  3. The maintenance cost keeps growing. It needed 2 hours/month in week one and needs 8 hours/month now. Kill it.
  4. You cannot explain the ROI in one sentence. Kill it.

Do not fall for sunk cost. The AED 7,400 (USD 2,020) you already spent does not justify spending AED 7,400 (USD 2,020) more. That money is gone whether you keep the tool or not.

What to measure with the current stack

Once you adopt Claude, Claude Code, and n8n, the metrics get more specific. Track these alongside your impact metrics so you can see where the spend is going and where the return is coming from.

Claude API spend per workflow. Every n8n workflow that calls the Claude API has a cost. Tag your API key by workflow if you can. A typical lead enrichment workflow runs at roughly AED 200 (USD 54) per month for 500 leads on Sonnet 4.6. A sales follow-up draft workflow runs at roughly AED 150 (USD 41) per month on Sonnet for a 30-deal pipeline. A monthly client report generator runs at roughly AED 250 (USD 68) per month for 15 clients. If a single workflow costs more than AED 500 (USD 136) per month, ask whether it is actually doing high-value work or just running.

Token cost per Claude conversation in the team workspace. A founder using Claude.ai daily for strategy and drafting will spend roughly AED 110 (USD 30) per month on the seat. If a team member's usage triples that, ask why. Either they are doing real work that justifies it, or they are using the wrong model size. Push routine drafting to Sonnet, hard reasoning to Opus, high-volume classification to Haiku.

Hours saved per week with Claude Code. Track this for the one or two people running Claude Code. If your operations lead saves 6 hours a week building internal tools instead of paying a developer AED 8,000 (USD 2,180) for the same work, the maths is obvious. If they spend 3 hours a week and save 2, the tool is not yet earning its place.

n8n workflow value. For each workflow, write down what it replaces. "This replaces 4 hours of manual data entry per week at roughly AED 75 (USD 20) per hour, so it saves AED 1,200 (USD 327) per month." Compare that to the workflow cost. A workflow that costs AED 200 (USD 54) per month and saves AED 1,200 (USD 327) in labour is a clear win. A workflow that costs AED 200 (USD 54) and saves nobody any time is a hobby.

The discipline is the same as before. Activity does not count. Money saved or money earned counts. Write the number down before you turn the workflow on.

Common mistakes

Measuring activity instead of impact. Your team sends you a report saying "AI handled 400 messages this month." That tells you nothing about whether those 400 messages created revenue or prevented churn. Ask: did any of those conversations lead to a signed contract or a renewed service agreement?

Stacking tools without stacking results. Each new AI subscription feels small. AED 500 (USD 136) here, AED 1,200 (USD 327) there. But five tools at AED 800 (USD 218) average is AED 4,000 (USD 1,090)/month, AED 48,000 (USD 13,075)/year. That is a junior employee's salary. Would that employee produce more than the five tools combined? Often, yes.

No baseline, no comparison. If you did not record your numbers before turning on the tool, you cannot prove it changed anything. You will convince yourself it helped because the team says it feels faster. Feelings are not data.

Letting the AI vendor define your success metrics. The vendor will show you usage dashboards. Logins, queries, documents generated. These are their metrics, not yours. Your metric is revenue per person and cost per project.

When to move on

You are ready to move on from this chapter when:

  • Every AI tool in your stack has a written ROI calculation updated monthly
  • You have killed at least one tool that was not performing (if you have not killed any, you are not measuring honestly)
  • Your team knows the difference between "this saves me time" and "this makes us money"
  • AI spending appears as a line item in your P&L with a clear return next to it

If you are paying for tools you cannot justify with a number, go back to the ROI calculation section and do the work before adding anything new.

Where to focus by team size

  • 10 to 19 people: Track time and output for your one AI use case. Keep it simple.
  • 20 to 34 people: Use the before/after framework for each active pilot. Kill anything that does not show impact in 60 days.
  • 35 to 50 people: Build AI metrics into your scoreboard alongside operational metrics.

Working prompts

People

  • Who on your team is responsible for tracking AI tool performance each month?
  • Does your team know why they are using each AI tool, or just how?
  • When someone says "this tool is great," do they mean it made their day easier or it made the business money?

Systems

  • Is there a monthly review where AI tool costs and returns are compared?
  • Do you have a shared spreadsheet or Zoho dashboard tracking the before/after numbers?
  • Are your AI tools connected to your actual sales or project data, or running in isolation?

AI

  • Which tools have a positive ROI based on impact metrics?
  • Which tools have been running for more than 60 days without a measured result?
  • If you cancelled every AI tool tomorrow, which one would you miss because of revenue impact, not convenience?

Founder exercise

Part A: Baseline audit (30 minutes)

Open your accounting software or Excel. For each AI tool you are currently paying for, write down:

  1. Tool name and monthly subscription cost in AED
  2. Hours per month your team spends learning, maintaining, or fixing it
  3. The impact metric it is supposed to improve (revenue per person, close rate, cost per project)
  4. The value of that metric before you started using the tool
  5. The value of that metric today

If you cannot fill in rows 4 and 5, that is your first finding. You are paying for something you cannot measure.

Part B: Kill or keep decision (20 minutes)

For each tool, calculate the ROI using the formula above. Then apply the kill criteria:

  • Positive ROI with clear impact metric improvement: keep it, review again in 60 days
  • Flat or negative ROI after 60+ days: cancel it this week
  • Cannot calculate ROI because you have no baseline: set a baseline today and start the 60-day clock

Send the results to your operations lead or finance person on WhatsApp. Make the decision visible.

Part C: Forward budget (15 minutes)

Add up the total monthly cost of the tools you are keeping. Compare it to what you would pay a part-time or full-time hire. Write one sentence: "Our AI stack costs AED X/month and produces AED Y/month in measurable return." If you cannot write that sentence, your stack is not justified.

ARCAS lens

The diagnosis engine maps AI spending to two audits. The Revenue Model audit checks whether your pricing and delivery model actually benefits from AI assistance. If your revenue model depends on senior expertise and relationship selling, an AI proposal writer might not move the needle. The Conversion audit checks whether AI tools improve your ability to turn leads into paying clients.

In the Five Levels model, unmeasured AI spending is a cost leak. It sits in the systems layer, invisible because it feels productive. The monthly subscriptions do not trigger the same scrutiny as hiring a new person, but the cumulative cost is often equivalent.

People first, then systems, then AI. Measurement is what keeps that sequence honest. If the AI is not producing results that appear in your revenue or cost numbers, it is a cost, not a systems improvement.

Start now: Quick self-assessment

Score each row 1 (not in place) to 4 (fully operational). Be honest.

AreaQuestionScore (1-4)
BaselineI have written pre-AI numbers for revenue per employee, cost per project, and conversion rate
TrackingI review AI tool ROI monthly using impact metrics, not activity metrics
Kill disciplineI have cancelled at least one AI tool that was not performing in the last 90 days
Cost visibilityTotal AI spend appears as a line item in my monthly P&L
Team clarityMy team can explain why each AI tool exists in terms of business outcomes
Budget justificationI can state the total monthly AI cost and the total monthly return in one sentence

Score 20-24: Your measurement system is working. Focus on refining and expanding what is proven.

Score 13-19: You have pieces in place but gaps in discipline. Start with Part A of the founder exercise this week.

Score 6-12: You are spending on AI without measuring it. Stop adding tools. Go back to the baseline audit and build the foundation before anything else.