The Ralph Loop: Run AI Agents in a Bash Loop Until They Finish

Most developers treat AI coding like a chat interface: prompt, watch, and intervene when it loses the thread. This human-in-the-loop (HITL) approach is fine for prototypes, but it fails on complex features.

The Ralph Loop is a technique for running AI coding agents in a simple bash loop to achieve autonomous, unsupervised work.

It solves the "implicit execution budget" problem, where an AI declares a job "done" even if the tests are still red.

You use a loop because AI has no taste, and it will lie to you about tests passing if you let it. The bigger win is that it solves "context rot." Long sessions don't just fill up the window; the model starts to summarize or "compact" history, losing your critical initial instructions.

By killing the session and restarting with a fresh context for every atomic task, you keep the agent sharp.

Cover image: an AI agent iterating in the Ralph Loop, each pass starting with a fresh context until the task is done

What is the Ralph Loop?

The Ralph Loop is not a complex framework; it's a simple bash while loop that runs an AI coding CLI like Claude Code or Amp. Coined by Geoffrey Huntley, the name refers to the Simpsons character Ralph Wiggum—the kid who fails constantly but keeps trying until he eventually succeeds. The loop treats failure as expected, not exceptional, forcing the agent to iterate until it meets binary success criteria.

Technically, the Anthropic plugin uses a stop hook to intercept the exit when the AI tries to end a session. But a "true" Ralph Loop runs outside the agent. It kills the process entirely and restarts it to guarantee a 100% clean context. This is the opposite of vibe coding, where you accept suggestions without scrutiny. In a Ralph Loop the agent—not the human—chooses the next task from a structured requirements file, explores the code, and implements changes until the "Completion Promise" sigil appears.

Why does looping work?

AI agents have a hidden execution budget. Once the model feels it has done a "reasonable" amount of work, it wraps up and exits based on how the code looks rather than how it works. You'll often find half-implemented APIs or skipped edge cases because the model decided it was "good enough."

Comparison of a long session suffering context rot versus the Ralph Loop starting each task with a fresh context

Worse, context windows are just arrays. Every message adds to that array until the model starts "compaction"—summarizing previous history to save space. Because that compaction loses the original project instructions, long sessions degrade their own reasoning the longer they run. Looping fixes this by starting each task with a fresh context. The agent stays focused because it isn't carrying the baggage of previous failed attempts or bloated history.

Anatomy of a Ralph Loop

A working loop relies on state files to carry memory between context resets. Don't write the prd.json yourself; humans are bad at writing binary, testable requirements. Instead, "mould the clay" by talking the spec through with the AI, then ask it to generate the structured JSON.

Anatomy of a Ralph Loop: the bash loop with its core state files — prd.json, progress.txt, agents.md and the Completion Promise sigil

prd.json: The living TODO list with binary passes: false/true flags.
progress.txt: Short-term memory of decisions, blockers, and files changed.
agents.md: Long-term, project-wide patterns and conventions.
The PIN System: A Markdown lookup table linking specific features to filenames. It stops the agent from hallucinating directory structures or inventing file names.
The Completion Promise: A specific sigil (e.g. <promise>COMPLETE</promise>) the agent emits only when every prd.json item passes.

A minimal prd.json looks like this:

json

{
  "requirements": [
    {
      "task": "Add users table migration",
      "passes": true
    },
    {
      "task": "Implement signup endpoint",
      "passes": false
    }
  ]
}

Getting started: from HITL to AFK

Don't jump straight to autonomous overnight builds. Learn the screwdriver before you pick up the jackhammer.

Three levels of Ralph automation: supervised manual run (HITL), attended loop, and autonomous overnight run (AFK)

Level 1 — The screwdriver (HITL): Run a single iteration by hand and watch the agent. This is where you refine the prompt and the AI-generated PRD.
Level 2 — The power drill (attended loops): Run 5–10 iterations at your desk. Catch mistakes early and pause the moment the agent starts going off-track.
Level 3 — The jackhammer (AFK): Once you trust the feedback loops, set the agent to run 30–50 iterations while you're away from the keyboard.

A practitioner afk-ralph.sh script wires those iterations together:

bash

# afk-ralph.sh
set -e # exit on error
iterations=$1
for i in $(seq 1 $iterations); do
  echo "Iteration $i of $iterations"
  # -p runs Claude in non-interactive print mode
  claude -p "Implement the next task in prd.json. Output <promise>COMPLETE</promise> when all tasks pass." | tee output.log
  if grep -q "<promise>COMPLETE</promise>" output.log; then
    echo "Task complete."
    break
  fi
done

Principles that keep Ralph on the rails

Engineering guardrails are the only thing standing between you and a $100 token bill with nothing to show for it.

Small steps: Tasks must be atomic. If a task is too big, the agent runs out of context and produces garbage.
Feedback loops: Non-negotiable. You cannot trust an AI to judge its own work.
Risk prioritization: Tackle architectural "spikes" and integration points first. Failing fast on a hard problem beats finishing ten easy UI tasks on a broken foundation.

A feedback loop is any check the agent can't argue with:

Feedback type	What it catches	Priority
Typecheck	Type mismatches, missing props	Essential
Linting	Code style, obvious logic bugs	High
Unit tests	Broken logic, regressions	Critical
Playwright	UI bugs the model can't "see" in code	High
Build check	Compilation or dependency errors	Critical

When Ralph fits — and when it doesn't

Ralph is "pay to play." High-end models like Claude Sonnet cost roughly $10/hour to loop. Local models aren't viable yet; they lack the reasoning to manage these autonomous state transitions.

Safety and the lethal trifecta

The "lethal trifecta" is the combination of untrusted tokens, internet access, and access to secret data. To defuse it, always run AFK agents inside Docker sandboxes. That stops a runaway agent or a malicious prompt injection from running rm -rf / or stealing your SSH keys. Ralph is great for generating 90% of a feature overnight—but you still review the git log in the morning.

FAQ

Is Ralph a plugin? No. A proper Ralph loop runs outside the agent as a bash script. Plugins that run inside the agent don't fix context rot because they never reset the context array.

Can it run in parallel? It's a red hot mess. Coordinating non-deterministic agents usually ends in contention, with each one stepping on the others' toes. Stick to a monolithic process for reliability.

Why use prd.json instead of a plain list? JSON gives you unambiguous binary tracking. The agent can programmatically flip passes: true once its feedback loops go green.

Can Ralph fix bugs automatically? Yes. Point Ralph at linting errors or failing tests and let it iterate until the status is green. It's software entropy running in reverse.

The Ralph Loop: Run AI Agents in a Bash Loop Until They Finish

What is the Ralph Loop?

Why does looping work?

Anatomy of a Ralph Loop

Getting started: from HITL to AFK

Principles that keep Ralph on the rails

When Ralph fits — and when it doesn't

Safety and the lethal trifecta

FAQ

References

Read more

Loop Engineering: Building Software With Agent Loops

Harness Engineering: Build the Scaffolding Around an AI Model

AGENTS.md: What Works, What Costs, and Best Practices