What AI Really Delivers in Code and Agile: Lessons from 100 Tasks Across 15 Projects

By Tintash•December 3, 2025

This article distills the key insights from a Lightning Talk by Murad Akhter, Co-Founder of Tintash and CTO of onBeacon.ai.

Real world use of AI in software development shows that acceleration is meaningful but inconsistent. Tintash analyzed about 100 engineering tasks across 15 projects and found that AI excels in structured, low context work and declines sharply in complex or revision heavy tasks. Leaders who benefit most apply AI selectively rather than universally.

Strategic Insights

AI performs extremely well on documentation, research, code conversion, and boilerplate.
Productivity varies by roughly 30 percent across sprints, which reduces planning reliability.
Context heavy tasks reduce AI effectiveness and increase rework.
QA becomes the bottleneck when testing cannot match accelerated coding.
AI is an assistant, not an engineer. Senior oversight remains essential.

Where AI Consistently Performs Well

Tintash’s review shows that AI provides its strongest and most predictable benefits in low context, highly structured tasks. These tasks allow the model to work within clean boundaries and produce usable first drafts that require minimal revision.

High performing areas include:

documentation and READMEs
API and technical writing
code conversion
boilerplate
simple prefabs
early drafts and architectural exploration
research and structured utilities

These tasks deliver reliable acceleration because dependencies are limited and quality expectations are clear.

This visual highlights the types of tasks where teams consistently saw the largest gains.

Why Productivity Declines in Real Projects and Where AI Struggles Most

Why Productivity Declines in Real Projects

The same teams that saw strong gains in structured tasks also reported significant variability when work required deeper context. Teams observed:

context window limitations
sprint to sprint volatility
tool instability and model degradation
uneven access to premium tools
AI driven scope expansion

This volatility made sprint planning difficult and reduced the reliability of the early productivity gains.

This visual shows how teams using similar tools and similar task patterns still experienced large fluctuations across sprints.

Why AI Struggles With Legacy Code and Complex UI Work

AI performs well on first drafts but breaks down as context increases. Legacy systems carry hidden dependencies and multi year structures that exceed the model’s reasoning capacity. Multi step UI flows behave similarly. As changes accumulate, accuracy declines and rework increases.

Areas where AI performs poorly include:

legacy code
complex logic
multi step UI flows
deeply intertwined systems
hardware related QA

Even the best AI cannot currently ingest and reason about complex in-project context well enough to help. For example: Tournament matchmaking logic for one of our games projects – 3 days with AI vs 3 days manually.

Why QA Becomes the New Bottleneck

AI accelerates coding speed, but QA does not accelerate at the same rate. This mismatch creates bugs, rework, and late sprint delays. Testing becomes the gating factor because validation requires human review and context absorption.

The net effect: coding becomes faster, but delivery does not.

How Team Structure Influences AI Outcomes

Murad’s review highlighted a consistent pattern. Teams with senior architectural oversight see fewer errors and more predictable gains. Senior engineers set guardrails, choose the right tasks for AI, and reduce rework by 10 to 40 percent.

Junior heavy teams experience more volatility. Tight deadlines increase rework because AI generated code is accepted without adequate review.

This visual highlights how different team compositions experience different levels of stability and rework.

A Practical Framework for Leaders

The data suggests a simple, actionable approach for matching AI to the right work.

Use AI aggressively for:

standalone modules
boilerplate
documentation
utilities
low context tasks

Use AI selectively for:

UI drafts
mid level features
code review

Avoid AI for:

legacy code
complex logic
multi step UI flows
hardware QA

This selective approach stabilizes outcomes and reduces volatility.

Conclusion

AI offers real acceleration, but the acceleration is uneven. Teams that use AI selectively, add senior oversight, and strengthen QA can harness meaningful gains. Teams that expect uniform improvement across all tasks face inconsistent results and avoidable rework.

Watch Full Talk

What AI really delivers in Code & Agile