
Right now, the gap between AI coding tool hype and reality is wide enough to drive a truck through, and LinkedIn would have you believe these tools are already navigating complex legacy architecture and shipping production-ready code autonomously. They’re not.
Meanwhile, most engineering leaders are dealing with more practical questions, like: where do these things actually help, and where do they just create different problems?
I’ve been working closely with engineering organizations as they navigate this, and it’s noticeable that the ones who are seeing success don’t treat AI tools as some kind of productivity panacea.
They’re often moving through four distinct stages — experimenting, adopting intentionally, measuring where it makes sense, and finally optimizing for cost. Without that deliberate approach (and a solid engineering foundation) these tools can create as many problems as they solve.
Let me walk through what each of those four stages look like in practice.
Experimentation here means exposure, not necessarily rigorous A/B testing with control groups and so forth. Get your hands on these tools, use them on actual work, and talk about what happens. Some people prefer editors with inline suggestions, and others are finding success coding on the command line with tools like Claude Code, describing what they want in plain language.
What’s interesting is how these tools can change your workflow in ways you don’t expect. The traditional advice about limiting work in progress starts to look different when you’re working with AI assistants. When Claude is working on something in one branch, you can start something else — though whether this actually reduces cognitive load seems to vary. Some engineers find juggling multiple parallel streams exhausting, while others find it helps them stay productive.
This doesn’t mean work-in-progress limits stop mattering. It just means you need to be thoughtful about what you’re working on. One branch for documentation, another for model updates? That works. Three agents all editing similar code, and you’re asking for merge conflicts.
The point is: experiment widely and pay attention to what works for you and your team.
At Swarmia, when these tools became available, we started using them, paying for them, and talking about what it was like to use them. That conversation piece has proved to be quite important — sharing what works, what doesn’t, and which tasks benefit most from AI assistance. Engineers have strong preferences about their tools and environments, but creating space for those conversations helps everyone learn faster.
What to do in this stage:
What not to do:
Once you understand where these tools help, make adoption easy and intentional across your organization. This means removing friction — make signup simple, clearly communicate what tools are available, and set expectations around usage.
Some organizations mandate AI tool use at this stage, with varying levels of success. At Swarmia, we don’t have a mandate, but we ask that everyone at least tries the tools. We’re not demanding teams use them for everything, just that they set them up and understand how they work so the barrier to incremental adoption stays low.
Adoption isn’t only about changing habits. You also need to invest in systems and developer experience.
DORA’s research on AI adoption shows where to focus that investment. Organizations that excel in these seven areas see better outcomes from AI tools, while those that don’t struggle regardless of which tools they choose:
Most of these capabilities were already important before AI (version control, small batches, quality platforms) but they’re much more important now.
The same practices that help human developers also help AI tools: modular code with clear responsibilities, comprehensive documentation, robust testing. If your organization is in the adoption phase, now’s a good time to get these capabilities squared away.
Now for what can be an uncomfortable topic: junior developers and AI tools. Some leaders worry about letting juniors use these tools. “They need fundamentals first. What if they submit poor quality code for review?”
And my answer to that is: yes, junior engineers submitting poor quality code (whether AI-assisted or hand-written) has real costs. But the solution isn’t restricting tool access. If your system can’t prevent low-quality code from reaching production regardless of how it was written, that’s a system problem. Not a people problem.
Strengthen your review process, invest in automated testing, and improve your deployment safeguards. These investments help your team regardless of which tools people use to write code — and they’re probably overdue anyway.
Tracking AI adoption gives you visibility into where barriers exist. If one team hits 80% adoption and another sits at 20%, it’s a signal to investigate.
Talk to the low-adoption team first. Maybe they’re working in a legacy codebase where AI tools struggle. Maybe they missed your communications. Maybe they tried the tools and found they didn’t fit their workflow. Maybe they need some training. Each conversation will tell you something useful about how these tools work in your context.
Then look at the high-adoption teams. What repositories are they using these tools on? What task types see the most activity? Which models or modes do they prefer? This data helps you identify patterns worth sharing. If your frontend team is getting great results using AI but your infrastructure team barely touches it, that’s valuable information too.
In Swarmia, you can see AI adoption rates, usage patterns across repositories and languages, and which modes and models teams prefer. This visibility helps you spot opportunities to increase adoption and understand what’s blocking progress — all before you start thinking about more rigorous productivity measurement.
What to do in this stage:
What not to do:
Once you have meaningful adoption, you can start examining AI’s impact on your delivery metrics. This is where things get interesting — and where you need to be thoughtful about what you’re measuring and why.
Start by looking at your existing engineering metrics through an AI lens. Measure the same things you’ve always measured — DORA metrics like change lead time, deployment frequency, change fail percentage, and failed deployment recovery time — but now segment by whether AI tools were involved.
This can give you an idea about whether AI-assisted work is moving through your system faster, slower, or about the same.
That’s different from trying to calculate an isolated “AI productivity gain” number, which is both methodologically impossible and potentially misleading.
When you see differences between AI-assisted and non-AI-assisted work, you’re looking at correlation, not causation. And correlation can point in multiple directions.
If AI-assisted pull requests have shorter cycle times, that could mean:
All of these scenarios are valuable to understand, but they require different responses. If senior engineers are the main AI users, you might need better onboarding for less experienced team members. If high-performing teams adopt faster, maybe you need to address systemic barriers elsewhere.
The key is not trying to prove pure causation — that would require controlled experiments that aren’t practical in most organizations. Instead, you’re looking for patterns that help you make better decisions.
To interpret your data more effectively:
Also, we all know by now that it’s not helpful to compare individual engineers, and that should not change when you’re measuring the impact of AI. Team dynamics and dependencies are too complex for that to be useful, and after all, the unit of delivery is the team.
Instead, look for actionable patterns in your data:
Combine this quantitative data with qualitative feedback from your teams. When you spot interesting patterns — like one team’s AI-assisted PRs moving notably faster — talk to them, or run a developer experience survey focused on AI usage. What are they doing differently? Can other teams learn from their approach?
The goal in measurement isn’t to produce a single “AI ROI” number, but to understand whether AI tools are helping your teams deliver better quality software faster, where they’re most effective, and where you might need to adjust your approach.
What to do in this stage:
What not to do:
This is where too many people want to start: figuring out if it makes sense to pay $200ish per engineer per month for Claude Code, Copilot, or whatever is the tool du jour.
If you go straight to cost optimization without the previous stages, you’ll never actually succeed with these tools. You need to know what works in your context before you can make smart decisions about consolidating tools or adjusting spend.
Once you’ve gone through the other stages, optimization becomes clearer. You consolidate tools based on data, reduce unnecessary usage by giving the AI better context, and discover that certain tools work better for certain use cases.
If these tools help your teams move faster, maintain quality, and reduce friction, spending the equivalent of one or two engineers’ salaries for 100 engineers worth of assistance is obviously worthwhile.
On the other hand, spending $250k on AI coding tool licenses won’t help much if your teams are waiting five days for code reviews or two weeks for deployments.
What to do in this stage:
What not to do:
The appetite for experimentation with these tools will diminish over time. We’re in a window right now where everyone is trying to figure this out, and while that window won’t last forever, there’s no need to panic or feel like you’re behind.
Just keep perspective: the measurement approach you need now isn’t fundamentally different from how you’ve always understood engineering effectiveness. You’re looking for the same signals — are we shipping quality software reliably? Can our teams do their best work without burning out? Is the work flowing smoothly through the system?
If you’re searching for real and sustainable results from AI, this four-stage approach works. Don’t skip stages because you’re impatient, and don’t let the hype convince you there’s a shortcut.
So just start. Experiment first, adopt intentionally, measure where it makes sense, then answer the cost question. These tools are promising, and with the right approach, they can genuinely improve how your teams work.
Subscribe to our newsletter
Get the latest product updates and #goodreads delivered to your inbox once a month.

