TL;DR

Google’s May 2026 whitepaper, “The New SDLC With Vibe Coding,” argues that software teams should focus less on model choice and more on the systems around AI coding agents. The paper claims generation is no longer the hard part; verification, judgment and direction now define the craft.

Google’s May 2026 whitepaper, The New SDLC With Vibe Coding, says the biggest change in software development is not a new programming language or cloud service, but a shift from writing code to directing AI systems that generate it, a claim with broad implications for teams already using coding agents in production workflows.

The paper, written by Addy Osmani, Shubham Saboo and Sokratis Kartakis, reports that as of early 2026, 85% of professional developers regularly use AI coding agents, 51% use them daily and about 41% of new code is AI-generated. Those figures are presented by the paper as evidence that AI-assisted development has moved from experiment to routine practice.

Its central argument is that the model itself is only a small part of agent performance. The paper describes a running agent as the combination of a model and a “harness”: prompts, tools, context policies, hooks, sandboxes, sub-agents, observability and the surrounding engineering process. According to the paper’s framing, the harness accounts for roughly 90% of the practical behavior teams experience.

The whitepaper also separates casual “vibe coding” from what it calls agentic engineering. In that framing, vibe coding means loose prompting and minimal review, while agentic engineering uses specifications, automated tests, evals, CI gates and human architectural oversight. The paper argues that without both tests and evals, teams may still be relying on fragile AI output even when the prompt appears sophisticated.

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Verification Becomes The Bottleneck

The paper matters because many software teams are already spending heavily on model access while treating verification, context management and workflow design as secondary work. Google’s authors argue that this order is backward: once AI can produce large amounts of code quickly, the cost shifts to checking whether that code is correct, maintainable and secure.

That has budget and operating implications. The source material describes casual AI coding as low upfront cost but high later cost, with possible debt from repeated fix loops, messy generated code and security remediation. By contrast, agentic engineering requires higher setup costs for specs, evals and tooling, but is presented as cheaper over time if it raises first-pass success and routes simpler work to lower-cost models.

Coding with AI For Dummies (For Dummies: Learning Made Easy)

View Latest Price

As an affiliate, we earn on qualifying purchases.

From Vibes To Agentic Work

The whitepaper builds on the term “vibe coding,” associated in the source material with Andrej Karpathy’s February 2025 description of accepting AI-generated code and feeding errors back until something works. The Google paper treats that style as one point on a spectrum rather than a catch-all label for every use of AI in programming.

At one end are prototypes, disposable scripts and low-risk experiments. At the other are production systems where AI-generated work passes through tests, evals, CI/CD gates and review. The source material says the paper’s strongest practical line is that the difference is not whether AI is used, but how the output is verified.

The paper also cites benchmark evidence to support its harness argument. According to the source material, one team moved an agent from outside the Top 30 to the Top 5 on Terminal Bench 2.0 by changing the harness while keeping the same model. It also cites a LangChain experiment that improved an agent score by 13.7 points through changes to prompts, tools and middleware.

“Generation is solved; verification, judgment, and direction are the new craft.”
— Osmani, Saboo and Kartakis, according to the Google whitepaper

Automated Software Testing: From Zero to Secure Deploy: The Practical Guide to Mastering Jest, Cypress, TDD, and CI/CD to Eliminate Production Bugs and Boost Your Developer Career

View Latest Price

As an affiliate, we earn on qualifying purchases.

Claims Still Need Proof

Several parts of the argument remain claims rather than settled facts. The paper’s 10% model and 90% harness split is described as a rough framing, not a universal measurement. It is also not clear from the source material how consistently that ratio applies across languages, codebases, teams or safety-critical software.

The reported adoption figures, benchmark references and cost claims should be read as source-attributed figures unless independently verified. The source material also notes that while the concepts are broadly tool-agnostic, Google’s recommended paths point toward its own Gemini, Jules and Agent Development Kit ecosystem.

Java Software Verification Tools: Evaluation and Recommended Methodology

View Latest Price

As an affiliate, we earn on qualifying purchases.

Teams Test The Harness Thesis

The next step for software organizations is likely to be practical validation: measuring whether better specs, evals, tool access, context policies and CI gates improve agent performance more than switching models alone. Teams using AI coding agents will also need to decide which parts of their harness they own internally and which they accept from vendors.

For readers, the immediate takeaway is that AI coding strategy is moving beyond model selection. The unresolved question is how quickly engineering organizations can build the verification systems needed to make AI-generated code reliable at scale.

Secure AI Model Deployment: A Comprehensive Guide to Safely Delivering Machine Learning Systems in Production Environments

View Latest Price

As an affiliate, we earn on qualifying purchases.

Key Questions

What did Google’s whitepaper say about AI coding?

It argued that software development is shifting from manually writing code to expressing intent and directing AI systems, with verification and workflow design becoming central to quality.

What does “the model is only 10%” mean?

It means the paper treats the model as only one part of an agent. The rest is the surrounding harness: prompts, tools, context, tests, evals, sandboxes and observability.

Is vibe coding the same as AI-assisted engineering?

No. In the paper’s framing, vibe coding is the loose, high-risk end of the spectrum. Agentic engineering adds specifications, tests, evals, CI gates and human oversight.

Are the adoption numbers confirmed independently?

The figures cited here are attributed to the Google whitepaper and its source material. Independent confirmation is not included in the provided material.

Why does this matter for software teams?

If the paper’s argument holds, teams may get more value from improving verification and agent setup than from waiting for the next model release.

Source: Thorsten Meyer AI

The Model Is Only 10%: The Real Lesson of the New SDLC

Author

AI Espionage Team

Share article

The model is only 10%

Verification Becomes The Bottleneck

Coding with AI For Dummies (For Dummies: Learning Made Easy)

From Vibes To Agentic Work

Automated Software Testing: From Zero to Secure Deploy: The Practical Guide to Mastering Jest, Cypress, TDD, and CI/CD to Eliminate Production Bugs and Boost Your Developer Career

Claims Still Need Proof

Java Software Verification Tools: Evaluation and Recommended Methodology

Teams Test The Harness Thesis

Secure AI Model Deployment: A Comprehensive Guide to Safely Delivering Machine Learning Systems in Production Environments

Key Questions

What did Google’s whitepaper say about AI coding?

What does “the model is only 10%” mean?

Is vibe coding the same as AI-assisted engineering?

Are the adoption numbers confirmed independently?

Why does this matter for software teams?

Cybersecurity Education Thrives at Iona University

Nvidia CEO’s Charitable Foundation Signs GPU Deal With CoreWeave

AI Codebreakers: Cracking Encryption in the Quantum Age

Deepfake Voices: A.I. Voice Cloning and the Security Threat It Poses

6 Best Waterproof Equipment Cases with Wheels in 2026

4 Best OBD2 Scanner Professionals for 2026

11 Best Monochrome Heavy Duty Laser Printers for 2026

What Makes Enterprise Routers Different From Consumer Routers

The Model Is Only 10%: The Real Lesson of the New SDLC

Author

AI Espionage Team

Share article

The model is only 10%

Verification Becomes The Bottleneck

Coding with AI For Dummies (For Dummies: Learning Made Easy)

From Vibes To Agentic Work

Automated Software Testing: From Zero to Secure Deploy: The Practical Guide to Mastering Jest, Cypress, TDD, and CI/CD to Eliminate Production Bugs and Boost Your Developer Career

Claims Still Need Proof

Java Software Verification Tools: Evaluation and Recommended Methodology

Teams Test The Harness Thesis

Secure AI Model Deployment: A Comprehensive Guide to Safely Delivering Machine Learning Systems in Production Environments

Key Questions

What did Google’s whitepaper say about AI coding?

What does “the model is only 10%” mean?

Is vibe coding the same as AI-assisted engineering?

Are the adoption numbers confirmed independently?

Why does this matter for software teams?

You May Also Like