The Three-Body Agent: Orchestrating Agents with GitHub Actions and Claude Code
Getting an AI agent to write code is the easy part. The harder problem, the one most teams discover only after the novelty fades, is everything around it: how do you schedule agents reliably and know when they’ve silently failed? How do you replicate the setup to a new project without rebuilding from scratch? The orchestration layer is where autonomous coding pipelines actually break down.
In my previous post, I described the agent factory I built for IGNIO: autonomous agents that pick issues from a project board, implement them, open PRs, and fix their own CI failures. That system shipped 182 PRs in 10 days and has since grown to 240 issues implemented autonomously. The agents worked. The architecture around them did not.
The whole pipeline ran through OpenClaw on my CI (local) machine, spawning Claude Code sessions via subscription tokens. OpenClaw was the orchestration layer, and it was the weak point. If the process crashed, the entire pipeline went silent with no signal that anything had broken. The only audit trail was whatever I could dig out of session logs after the fact. Replicating the setup to another project meant copying scripts manually and hoping nothing was missed. What I really wanted was a system that lived alongside the code it operated on and could be replicated to a new repository by copying a directory.
I already knew the architecture was wrong. Then, on April 4, 2026, Anthropic made the decision urgent. They blocked subscription tokens from working with third-party tools like OpenClaw. Developers had been routing frontier AI through flat-rate subscriptions while consuming compute that should have been billed per-token. Anthropic closed the arbitrage.
I bit the bullet. I moved everything to pay-per-token API keys, dropped the spawned subscription sessions entirely, and chose GitHub Actions as the new orchestration runtime. No grey areas, no workarounds.
Anthropic does offer a first-party GitHub Action (anthropics/claude-code-action), and it’s excellent for interactive use cases like PR review and @claude mentions. But it’s not the right fit for long-running autonomous agents. The action is optimized for shorter interactions, and its permission model requires enumerating every allowed tool explicitly. For 90-minute sessions with 500 max turns doing full-issue implementation, calling Claude Code CLI directly is simpler and gives full control. The pipeline’s real complexity (board scanning, priority ranking, dependency detection, GraphQL mutations) is shell logic that runs before and after the Claude step. The action doesn’t help with any of that.
The migration took about two days. The machine running the agents didn’t change. My self-hosted runner is the same development environment I’ve always used. What changed is everything around it: the orchestration now lives in version-controlled workflow files inside .github/workflows/, every run is auditable through the Actions UI, and failures surface as notifications instead of silence. Copying that directory to a new repo replicates the entire agent pipeline.
What follows is the full architecture and what I learned running this at scale.
I call this architecture the Three-Body Agent, a nod to the physics problem where three gravitationally bound bodies produce motion so complex it defies closed-form prediction. Replace celestial bodies with autonomous agents and the analogy holds: three agents orbiting the same codebase, each on its own schedule, their interactions producing stable results that none of them could achieve alone.
The three bodies are the Implementer, the Fixer, and the Merger. The Implementer picks issues and writes code. The Fixer watches for CI failures and code review comments, then autonomously repairs them. The Merger scans for fully green PRs and merges them. Each agent runs independently, but their orbits overlap: the Implementer creates work that triggers the Fixer, and the Fixer produces clean PRs that the Merger picks up.
┌─────────────────────────────────────────────────────┐
│ The Three-Body Agent │
│ (orbiting the same codebase) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Implementer │ │ Fixer │ │ Merger │ │
│ │ (hourly) │─→│ (every 30m) │─→│ (every 2h) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Supporting workflows: │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Board Sync │ │ Rollover │ │
│ │ (PR events) │ │ (weekly) │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ Infrastructure: GitHub Actions + Claude Code CLI │
│ State: GitHub Projects V2 (board, milestones) │
│ Visibility: Telegram notifications │
└─────────────────────────────────────────────────────┘
Six GitHub Actions workflows make up the system. The three core agents do the autonomous work, while Board Sync and Week Rollover handle project management. A reusable Telegram workflow provides visibility at every stage, and GitHub Projects V2 is the shared state.
The implementer’s first job, before any code gets written, is picking the right issue. This runs as a prepare job in the implementer workflow, triggered every hour. Its job is simple: find the highest-priority TODO issue on the board and dispatch the implementation.
name: "[AUTOAGENT] Implementer"
on:
schedule:
- cron: "0 * * * *" # Every hour
workflow_dispatch:
inputs:
force_issue:
description: "Force a specific issue number"
required: false
type: string
“Pick the best one” is the hard part. With 28 TODO issues on the board, the agent needs a clear decision framework. The selection logic works in layers:
First, guard against duplicates. Does this issue already have an open PR? If yes, skip it. But multiple issues can be in progress at the same time. Early versions of this system enforced a strict one-at-a-time policy, but that turned out to be unnecessarily conservative. GitHub’s concurrency groups handle queuing, and the agents don’t step on each other as long as they’re working on different branches.
# Check if issue already has an open PR
EXISTING_PR=$(gh pr list --search "head:autoagent/${ISSUE_NUM}" \
--state open --json number --jq 'length')
if [ "$EXISTING_PR" -gt 0 ]; then
echo "Issue #${ISSUE_NUM} already has an open PR. Skipping."
exit 0
fi
Then, priority ranking. I use p0 through p5 labels. The orchestrator sorts by priority first, then by issue body length as a tiebreaker (longer descriptions tend to be better specified, which means higher success rates for autonomous implementation).
BEST_ISSUE=$(echo "$TODO_ITEMS" | jq -s '
sort_by(
(if (.labels | index("p0")) then 0
elif (.labels | index("p1")) then 1
elif (.labels | index("p2")) then 2
elif (.labels | index("p3")) then 3
elif (.labels | index("p4")) then 4
elif (.labels | index("p5")) then 5
else 6 end),
.bodyLength
) | .[0]
')
I’ll come back to why this simple label system works surprisingly well in the “Steering Work” section.
Then, dependency detection. If an issue body contains “Depends on: #359”, the orchestrator finds #359’s open PR branch and passes it as the base branch instead of main. This way the implementer builds on top of the dependency rather than conflicting with it.
# Simple dependency detection
DEP_NUM=$(echo "$ISSUE_BODY" | grep -oP 'Depends on:?\s*#?\K\d+' | head -1)
if [ -n "$DEP_NUM" ]; then
DEP_BRANCH=$(gh pr list --search "head:autoagent/${DEP_NUM}" \
--state open --json headRefName --jq '.[0].headRefName // empty')
if [ -n "$DEP_BRANCH" ]; then
BASE_BRANCH="$DEP_BRANCH"
fi
fi
Finally, dispatch. The orchestrator moves the issue to IN PROGRESS on the board and dispatches the implementer workflow. A Telegram notification goes out at the same time.
# Move to IN PROGRESS via GraphQL mutation
gh api graphql -f query="
mutation {
updateProjectV2ItemFieldValue(input: {
projectId: \"${PROJECT_ID}\"
itemId: \"${ITEM_ID}\"
fieldId: \"${STATUS_FIELD_ID}\"
value: { singleSelectOptionId: \"${IN_PROGRESS_ID}\" }
}) { projectV2Item { id } }
}
"
# Dispatch implementer
gh workflow run autoagent-implementer.yml \
-f issue_number="$ISSUE_NUM" \
-f base_branch="$BASE_BRANCH" \
-f model=claude-opus-4-6
The entire orchestrator is about 150 lines of shell and GraphQL. No framework or SDK. The only dependencies are gh and jq.
This is the heavy lifter. It receives an issue number, checks out the repo, and hands everything to Claude Code CLI with a structured prompt: 90-minute timeout, 500 max turns, model selectable per run (Opus for complex work, Sonnet for simpler issues).
name: Autoagent Implementer
on:
workflow_dispatch:
inputs:
issue_number:
description: "GitHub issue number to implement"
required: true
type: string
base_branch:
description: "Base branch"
required: false
default: "main"
type: string
model:
description: "Claude model to use"
default: "claude-opus-4-6"
type: choice
options:
- claude-opus-4-6
- claude-sonnet-4-6
jobs:
implement:
runs-on: [self-hosted, macOS]
env:
ANTHROPIC_API_KEY: $
The prompt is the most important part. Not because it’s clever, but because it’s structured. Claude Code works best when you give it a structured sequence of steps with explicit rules about what NOT to do.
The prompts live in .github/prompts/ as separate Markdown files, loaded at runtime with envsubst for variable interpolation. This keeps the workflow YAML clean and makes the prompts easy to iterate on without touching the workflow logic.
envsubst '$ISSUE_NUM $ISSUE_TITLE $ISSUE_BODY $BASE' \
< .github/prompts/implementer.md | \
claude -p --model "$MODEL" --max-turns 500 \
--permission-mode auto
Here’s the implementer prompt (.github/prompts/implementer.md):
You are an autonomous implementer agent.
Follow these steps IN ORDER. Do not skip steps.
## Issue #${ISSUE_NUM}: ${ISSUE_TITLE}
${ISSUE_BODY}
## Step 0: Understand the Project
Read ALL documentation for full context BEFORE writing any code.
Understand the tech stack, coding patterns, naming conventions,
and project structure.
## Step 1: Create Branch
git checkout -b autoagent/${ISSUE_NUM}-<short-slug>
## Step 2: Plan
- Which files to change and why
- Which new files to create (prefer editing existing files)
- Which tests to add or update
## Step 3: Implement
- Match existing patterns
- Keep it simple. No over-engineering
- No new dependencies without strong justification
## Step 4: Test
- Run the test suite and fix failures
- Do not lower coverage thresholds
## Step 5: Commit, Push, Open PR
- Commits grouped by context
- PR must include 'Closes #${ISSUE_NUM}'
## Rules
- Do NOT refactor code unrelated to the issue
- Do NOT update dependencies unless the issue requires it
- Do NOT force-push or rewrite history
- Do NOT merge the PR
A few things I learned about this prompt structure:
Step 0 matters more than you think. Without the explicit “read the docs first” instruction, Claude would jump straight into coding and miss project conventions. With it, the agent reads AI_INSTRUCTIONS.md (which has naming patterns, file structure rules, test conventions) and the architecture docs before touching anything. The success rate jumped noticeably once I added this step.
Negative rules are essential. “Do NOT refactor unrelated code” prevents the scope creep that kills autonomous PRs. Without that rule, the agent would routinely touch 15 files when only 3 were needed, making code review painful and introducing unrelated regressions.
--permission-mode auto is the right default for CI. It runs a safety classifier that reviews every action, blocking genuinely dangerous operations (mass deletion, force push, external code execution) while allowing normal development work. If the classifier blocks an action repeatedly, the session aborts, which is a safe failure mode for headless runs. For fully isolated disposable runners, you could switch to bypassPermissions, but on shared or persistent runners auto gives you autonomy without disabling all guardrails.
On success, the workflow comments on the issue with a link to the PR. On failure, it comments with a link to the workflow run so you can debug.
# On success
gh issue comment "$ISSUE_NUM" \
--body "✅ Implementation complete. PR: ${PR_URL}"
# On failure
gh issue comment "$ISSUE_NUM" \
--body "⚠️ Autoagent failed. [Workflow run]($RUN_URL)"
Every 30 minutes, the fixer scans for autoagent PRs that need attention: failed CI, review comments requesting changes, or merge conflicts. It gathers all the failure context, feeds it to Claude, and pushes fixes.
name: "[AUTOAGENT] Fixer"
on:
schedule:
- cron: "*/30 * * * *" # Every 30 minutes
check_suite:
types: [completed] # Auto-trigger on failed CI
workflow_dispatch:
inputs:
pr_number:
description: "PR number to fix"
required: true
type: string
fix_type:
description: "What to fix"
default: "all"
type: choice
options:
- all
- ci
- review
- conflict
The real value is in context gathering. Before calling Claude, the fixer builds a comprehensive failure report:
# CI check results
gh pr checks "$PR_NUM" >> /tmp/failure-context.md
# Failed run logs (last 300 lines)
FAILED_RUN=$(gh run list --branch "$BRANCH" --status failure \
--limit 1 --json databaseId -q '.[0].databaseId')
gh run view "$FAILED_RUN" --log-failed | tail -300 >> /tmp/failure-context.md
# Code review comments
gh api "/repos/$REPO/pulls/$PR_NUM/comments" \
--jq '.[] | "### \(.path):\(.line)\n\(.body)\n"' >> /tmp/failure-context.md
# Review verdicts (changes requested)
gh api "/repos/$REPO/pulls/$PR_NUM/reviews" \
--jq '.[] | select(.state == "CHANGES_REQUESTED") |
"### Review by \(.user.login):\n\(.body)\n"' >> /tmp/failure-context.md
# Merge conflict check
MERGEABLE=$(gh pr view "$PR_NUM" --json mergeable -q '.mergeable')
if [ "$MERGEABLE" = "CONFLICTING" ]; then
echo "## ⚠️ MERGE CONFLICT" >> /tmp/failure-context.md
fi
This context document gets fed directly to Claude. The fixer handles three types of failures:
--force-with-lease.To prevent the fixer from running in circles on the same PR, it queries active workflow runs and skips any PR that already has a fixer in progress. Each run is logged as a PR comment with the workflow link, so I can trace the full repair history.
gh pr comment "$PR_NUM" \
--body "🔧 Fixer run — [workflow run]($RUN_URL)"
The fixer also only touches autoagent branches. It checks the branch prefix before doing anything:
PR_BRANCH=$(gh pr view "$PR_NUM" --json headRefName -q '.headRefName')
if [[ ! "$PR_BRANCH" =~ ^autoagent/ ]]; then
echo "Not an autoagent branch. Skipping."
exit 0
fi
This means I can safely enable the check_suite trigger without worrying about the fixer touching human PRs.
The third body in the system. Every two hours, the Merger scans for autoagent PRs that are fully green: all CI checks pass, no changes requested, no merge conflicts. If a PR clears all gates, it gets merged automatically.
name: "[AUTOAGENT] Merger"
on:
schedule:
- cron: "0 */2 * * *" # Every 2 hours
workflow_dispatch:
concurrency:
group: autoagent-merger
cancel-in-progress: false
The Merger processes PRs sequentially for a reason. Merging PR A can create conflicts in PR B. After each merge, the Merger re-verifies the next PR’s status before proceeding. If a PR now has conflicts, it gets skipped. The Fixer picks up the rebase on its next cycle.
Before merging, Claude analyzes the PR’s review comments and recent commits to determine whether critical issues were actually addressed. This is smarter than keyword matching. A reviewer might request changes, the Fixer pushes a fix, but the review verdict still says “changes requested.” Claude reads the conversation and determines whether the underlying concern was resolved.
# Re-verify before each merge
MERGEABLE=$(gh pr view "$PR_NUM" --json mergeable -q '.mergeable')
if [ "$MERGEABLE" = "CONFLICTING" ]; then
echo "Conflict detected after prior merge. Skipping."
continue
fi
# Merge if all gates pass
gh pr merge "$PR_NUM" --merge
This is what completes the autonomous loop. The Implementer writes code, the Fixer repairs it, and the Merger ships it. Without the Merger, every PR still required a human to click the merge button, which defeated the purpose of full autonomy.
This workflow is event-driven, not scheduled. It listens for PR events on autoagent branches and moves the corresponding issue through the board columns:
# Extract issue number from branch name
# autoagent/353-wire-categorization-rules → 353
BRANCH="$"
ISSUE_NUM=$(echo "$BRANCH" | sed 's|autoagent/||' | grep -oP '^\d+')
The board update uses GitHub’s GraphQL API to set the Status field on the project item:
gh api graphql -f query="
mutation {
updateProjectV2ItemFieldValue(input: {
projectId: \"${PROJECT_ID}\"
itemId: \"${ITEM_ID}\"
fieldId: \"${FIELD_ID}\"
value: { singleSelectOptionId: \"${READY_FOR_QA_ID}\" }
}) { projectV2Item { id } }
}
"
Every state change also sends a Telegram notification. When a PR opens, I get a message with the issue title and a direct link. When it merges, a celebration emoji. When it closes without merge, a heads-up.
This is the glue that keeps the project board accurate without any manual board management. I haven’t dragged an issue card in weeks.
Every Sunday at 23:00 Berlin time, the rollover workflow runs. It creates the next calendar week milestone and moves all open issues into it, closing the old milestone if it’s empty.
I use GitHub milestones with a naming convention: 26 CW 14 (year, calendar week). The workflow calculates the current and next week numbers automatically:
YEAR=$(date -u +%y)
CURRENT_WEEK_NUM=$((10#$(date -u +%V)))
NEXT_WEEK_NUM=$((CURRENT_WEEK_NUM + 1))
CURRENT_TITLE="${YEAR} CW ${CURRENT_WEEK_NUM}"
NEXT_TITLE="${YEAR} CW ${NEXT_WEEK_NUM}"
Creating the next milestone and moving issues is straightforward:
# Create next milestone
NEXT_MS_NUMBER=$(gh api "repos/$REPO/milestones" \
-X POST \
-f title="$NEXT_TITLE" \
-f due_on="$NEXT_DUE" \
-f state="open" \
--jq '.number')
# Move each open issue
gh issue list --milestone "$CURRENT_TITLE" --state open \
--json number | jq -r '.[].number' | while read ISSUE_NUM; do
gh api "repos/$REPO/issues/$ISSUE_NUM" \
-X PATCH -F milestone="$NEXT_MS_NUMBER" --silent
done
The result: I never manually create milestones or move issues between sprints. The board always reflects reality. Done issues stay in their original milestone for historical tracking. Open issues automatically carry forward.
OpenClaw remained as the conversational channel, now running on API keys. The split turned out to be more than a workaround. It maps to a real distinction in how agent work breaks down.
Some work is inherently conversational. One evening, for example, I asked my agent to disable all the agentic workflows on the repo, comment out the cron triggers, push directly to main, then create a new week rollover workflow. That involved reading each file, deciding what to comment out versus delete, and pushing multiple commits. Some edge cases required judgment: the Copilot workflow couldn’t be disabled via API, for instance. A 10-minute back-and-forth with decisions at every step.
You wouldn’t put that in a workflow file. It requires context and judgment, including the ability to ask “wait, should I also disable the cron jobs on the OpenClaw side?” Telegram is the right channel for that kind of work.
The split is:
This isn’t a compromise. It’s the right architecture. Some work needs autonomy, some needs conversation, and trying to force both into the same system makes both worse.
When you run autonomous agents, you lose direct visibility. They work in the background, on their own schedule. If they succeed, great. If they fail, silence.
Silence is the worst signal. You can’t tell the difference between “nothing happened” and “everything is broken.”
I learned this the hard way. For hours, my agents were stalling and I had no idea. They would pick an issue, move it to IN PROGRESS, and then the session would time out. From my side: silence. I assumed they were working. They were not. The only way I noticed was the absence of results in the reporting chain. When an agent that normally ships a PR every couple of hours goes quiet, something is wrong. But by the time you notice the absence, you’ve already lost those hours.
The fix was Telegram notifications at every stage of the pipeline:
Every workflow ends with a curl to the Telegram Bot API:
TELEGRAM_MSG="🤖 Autoagent: Dispatched #${ISSUE_NUM} — ${ISSUE_TITLE}"
curl -s "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
-d "chat_id=${CHAT_ID}" \
-d "text=${TELEGRAM_MSG}" > /dev/null 2>&1 || true
Once I had visibility, I could trust the system. Not because the agents became more reliable, but because I could see what they were doing. Trust comes from transparency, not from perfection.
Output channels are how agents report back. Every state transition produces a notification.
Input channels are how you steer agents without interrupting their workflow. You don’t message the agent directly. You change the environment it reads from: create an issue or update a label. The orchestrator picks up the change on its next run.
When you have 28 TODO issues with dependencies between them, a “pick the most isolated issue” heuristic breaks down fast. My orchestrator kept selecting #360 (integrate insights into daily recap) when #359 (build the insight generator) hadn’t been built yet. The dependency detection helps with explicit “Depends on” declarations, but not every dependency is documented that way.
The fix: six labels.
p0: Drop everything. Critical bug or blocker.p1: High priority. Do this first.p2: Normal priority. Core feature work.p3: Medium priority. Important but not urgent.p4: Low priority. Nice to have.p5: Backlog. Get to it eventually.The orchestrator uses the same sort_by logic shown earlier: priority first, description length as a tiebreaker. No dependency graph resolver or topological sort needed. Just labels.
Real example: I tagged #359 (Insight generator agent) as p0 and all the Phase 1 issues as p1. The agent immediately started working on #359 instead of getting confused by the dependency chain.
Reprioritization is trivial. Change a label, and the orchestrator adapts on its next hourly run.
The workflows use [self-hosted, macOS] runners. This was a deliberate choice, not a cost optimization.
I want full control over my machine. The self-hosted runner uses the same development environment with real dependencies. When the implementer runs pnpm test, it runs against the actual project setup. The API key is stored as a repository secret and injected at runtime, keeping authentication clean and aligned with Anthropic’s documented approach.
The benefits:
gh CLI authenticated, git configured.The tradeoff is reliability. If my machine is off, the agents don’t run. For a solo developer, that’s fine. For a team, you’d want dedicated runner infrastructure.
After running this setup for several weeks:
The 5% that needed human help were typically issues with ambiguous requirements or edge cases that required domain knowledge the agent didn’t have. Writing better issue descriptions directly improves the success rate.
Write better issue descriptions from the start. The number one predictor of autonomous implementation success is the quality of the issue description. Vague issues produce vague PRs. Issues with clear acceptance criteria and references to existing code produce clean implementations.
Start with the fixer, not the implementer. The Fixer delivers value immediately on any project with CI. The implementer requires more setup (project board, labels, orchestrator). If I were starting over, I’d deploy the fixer first and add the rest incrementally.
Monitor token usage. Running Opus 4.6 with 500 max turns can get expensive. Track your API costs per workflow run. Sonnet works fine for straightforward issues and costs a fraction of what Opus does. The model dropdown in the workflow inputs lets you choose per run.
You don’t need a complex agent framework. GitHub Actions and Claude Code with a well-structured prompt is enough to build a full autonomous development pipeline. The orchestration is just shell scripts and GraphQL queries. The intelligence comes from the model, not the framework.
The entire system is six YAML files totaling about 800 lines of shell, with no dependencies beyond gh and jq. Every run is auditable in the Actions logs, version-controlled alongside the code it operates on. And when the platform you were using changes its rules overnight, having your agent infrastructure on GitHub’s own rails means you’re standing on solid ground.
It’s all open-source: leonardocardoso/three-body-agent. Copy the .github/workflows/ directory, configure your secrets, and you have the full pipeline.
Behold The Three-Body Agent