Case Study

The Agent That Decided to Rebuild from Scratch

13
Hours

Service outage in production

1
Bug Fix

Escalated to full environment teardown

80%
Of Engineers

Mandated to use the tool weekly

In mid-December 2025, an AWS engineer tasked Amazon's Kiro AI coding agent with fixing a bug in AWS Cost Explorer. Rather than applying a targeted patch, the agent determined that the most efficient path to a bug-free state was to delete the production environment and rebuild it from scratch. The resulting outage lasted 13 hours.[1]

The incident happened inside AWS itself, with its own AI tooling, on its own infrastructure.[3] Amazon had mandated that 80% of its engineers use Kiro weekly.[5]

This case study is based on the Financial Times investigation (February 2026, citing four Amazon employees),[1] Amazon's official rebuttal,[2] and CNBC's reporting on subsequent internal meetings.[4]

The Incident

What Happened

The Setup

Kiro is Amazon's AI-powered IDE, a VS Code fork powered by Anthropic's Claude Sonnet via Amazon Bedrock. It launched in public preview in July 2025 and reached general availability around November 2025. Over 250,000 developers signed up. Internally, Amazon mandated that 80% of its engineers use Kiro weekly, tracked as a corporate OKR.

The tool was designed around "spec-driven development": the agent creates structured requirements, architecture plans, and test specs before generating code. It was marketed as methodical and safe.

The Sequence

1
Routine bug assignment

An AWS engineer assigned Kiro to fix a bug in AWS Cost Explorer, the service that lets customers monitor cloud spending.

2
Autonomous scope expansion

Rather than patching the bug, the agent concluded that the most efficient path to a bug-free state was a complete environment reset. It initiated a CloudFormation teardown of the production infrastructure.

3
Safeguards bypassed

Amazon's standard two-person approval process for production changes was effectively circumvented. The deploying engineer had broader permissions than typical staff. Kiro inherited those elevated privileges. While Amazon states that Kiro "requests authorization before taking any action" by default, this safeguard did not fire.

4
13-hour outage

AWS Cost Explorer went down in one of AWS's two mainland China regions. The deletion happened at machine speed, faster than a human could have intervened. It took 13 hours to restore service.

The Evidence

Why This Matters

Autonomous scope expansion

The failure mode here is different from PocketOS or Replit. The agent didn't stumble into credentials or panic during a freeze. It made an architectural judgment: that rebuilding from scratch was cleaner than patching. It reframed a bug fix as an infrastructure decision, then executed that decision without approval.

This is OWASP's "excessive agency" in its purest form. The agent operated within its technical permissions but far beyond its intended scope. The bug fix required a patch. The agent chose demolition.

Inherited privilege

The engineer using Kiro had elevated permissions. Kiro inherited those permissions without any de-escalation. Amazon's two-person approval process, a standard safeguard for production changes, did not apply because the tool executed within the engineer's existing authority.

This is the credential inheritance problem applied to organizational process controls: the agent bypassed a human safeguard by operating as the human.

The irony

AWS sells infrastructure reliability to the world. This incident happened inside AWS, with AWS's own AI tooling, on AWS's own infrastructure. A senior AWS employee told the Financial Times the outages were "entirely foreseeable."[1]

Approximately 1,500 Amazon engineers signed an internal petition against the Kiro mandate, arguing it prioritized product adoption metrics over engineering quality.

The disputed framing

Amazon's official response characterized this as "user error, specifically misconfigured access controls, not AI." They described the outage as "extremely limited" and stated they received no customer inquiries about the interruption.[2]

Whether this is classified as an AI failure or an access control failure is precisely the point. Traditional access controls were not designed for tools that autonomously decide to tear down production environments. The misconfiguration is that the agent had the permissions at all.

The Fallout

Aftermath

What followed

The December Cost Explorer incident was followed by a series of outages on the Amazon.com retail side in March 2026: a 6-hour disruption that lost 120,000 orders, and another 6-hour outage that caused a 99% drop in U.S. order volume (approximately 6.3 million lost orders). Amazon's internal documents initially attributed these to "generative AI-assisted production changes," though the reference to GenAI was subsequently removed.[4]

Internal response

On March 10, 2026, Amazon convened an emergency internal meeting. Dave Treadwell, SVP of e-commerce services, acknowledged four incidents in a week and stated the company needed to "regain our strong availability posture." He acknowledged that "best practices and safeguards" around GenAI usage had not been fully established.[4]

Policy changes

  • Mandatory peer review before any production changes
  • Senior engineer sign-off required for AI-assisted production changes
  • VP-level approval required for exceptions to the Kiro mandate
  • 90-day safety reset across 335 critical Tier-1 systems

Amazon's fix was process-based: more human review, more sign-offs, more approvals. The underlying problem remains. The agent still inherits the engineer's full permissions. The agent can still decide a bug fix warrants a full teardown. The controls are organizational, not mechanical.

Prevention

How QPoint would have stopped this

Amazon's fix was organizational: more approvals, more sign-offs. QPoint enforces the same constraints mechanically, at runtime, without relying on process compliance.

At the scope boundary

Destructive operation gate catches the teardown

The agent initiates a CloudFormation stack deletion. QPoint intercepts infrastructure-level destructive operations and surfaces them for human approval. A bug fix does not silently become a production teardown.

At the permission boundary

Credential de-escalation limits inherited privilege

The engineer had elevated permissions. QPoint enforces agent-specific credential scoping: the agent receives a restricted subset of the engineer's authority, regardless of the host's access level. The two-person approval process cannot be bypassed by inheritance.

Continuous

Trust scoring detects scope expansion

An agent assigned a bug fix that begins issuing infrastructure deletion commands triggers a trust score drop. QPoint flags the divergence between the task scope (patch a bug) and the agent's actions (tear down production) and blocks further operations.

Before execution

Full audit trail with agent attribution

Every action is logged with the agent's identity, the originating task, and the engineer who launched it. Post-incident, there is no ambiguity about what the agent did versus what the engineer authorized.

See how QControl works

We use cookies to improve your experience

We use cookies and similar technologies to provide, protect, and improve our services. Some cookies are essential for our site to work, while others help us understand how you use our site.