// Running Claude Code background agents in production for a team
// means owning the boring parts:
// [ ] Ephemeral per-task sandbox (cold-start in <5s, isolated FS)
// [ ] Git clone-with-token, branch policy, atomic commit + push
// [ ] Per-task token budget and dollar accounting (every task, every model)
// [ ] Retry on 429, 5xx, timeout, network drop, mid-stream truncation
// [ ] Streaming protocol that survives client disconnects
// [ ] Per-team API keys with scoped permissions
// [ ] Multi-repo orchestration (one task, N repos)
// [ ] Audit trail (who ran what, when, on which repo, at what cost)
// [ ] Pass-through billing if you embed this in your own product
// [ ] GitHub and Bitbucket parity
// What you actually want to write:
const result = await orchestra.tasks.create({
repo: 'acme/api',
task: 'Bump terraform-aws-provider from 5.x to 6.x; update breaking calls',
model: 'claude-sonnet-4-6',
team: 'platform',
budget: { maxUsd: 2.00 },
});In v2.0.60 of Claude Code, Anthropic introduced background agents, alongside the Sub-Agents model. The idea: instead of Claude Code being a turn-based assistant you watch run in a terminal, it becomes a parallel development environment. You launch agents that work on independent tasks in the background, monitor them from one place, and let them finish on their own.
Combined with the Claude Code Agent SDK, this gives a developer everything they need to write a script that spawns an autonomous coding loop: tool definitions, the agent harness, MCP server support, system prompt caching. It is a powerful primitive.
Owning that primitive in production for an engineering team is a different problem.
Claude Code runs in the directory you launch it from. For a team, you need an isolated, ephemeral filesystem per task. That means a sandbox manager (think E2B, Modal, Fly Machines), cold-start tuning so users do not wait 30 seconds, and a cleanup loop so old sandboxes do not pile up.
Clone with the user's token (which may rotate mid-run), check out a feature branch, watch the agent edit, atomically commit, push back. Detect when the push fails because the user's token was scoped wrong and surface a useful error, not a stack trace. Build the same path for GitHub and Bitbucket.
The SDK's default retry handles 429s well. Production needs more: 5xx exponential backoff, hard timeouts on time-to-first-byte (60s is too tight on big prompts), retry on network drops, surface mid-stream truncation as a typed error instead of a silent exit when the model hits its output cap.
A team UI streams every tool call and edit. The agent should not die because the user closed their laptop. A managed wrapper persists the event stream on the server, so reconnects pick up where they left off. The SDK does not give you this; you build it.
A platform team needs to enforce "this project gets $2 per task and 50,000 tokens per day." Anthropic's console gives you org-level rate limits, not per-project budgets with per-task caps and a refund path on failure. You build it.
"Run this dependency upgrade across 12 repos and open 12 PRs" is a normal Tuesday for a platform team. The Claude Code SDK is single-process, single-directory. Wrapping it into a fleet runner means a queue, idempotency, retries per task, and a dashboard.
Every task: who triggered it, which repo, which model, prompt and completion tokens, dollar cost, success, failure mode. For SOC 2 conversations and for engineering-lead retros. This is a database, an admin UI, and a warehouse pipe you build once.
Pass it per request. Never stored. Model tokens bill direct to your Anthropic account. Switch to Sonnet 4.5 today, 4.7 tomorrow, no migration.
Scoped per project, per env, per task type. Budget caps enforced server-side. Rotate without touching app code.
Ephemeral per-task working directory with node, npm, git, jq, ripgrep. Cold-start under 5 seconds. We run it, you do not.
SSE for every tool call, edit, commit. Persisted server-side so clients can disconnect and reconnect. Typed errors on truncation, timeout, push failure.
Every task is a row. Who, when, repo, model, tokens, dollars. Pipe to your warehouse. Pass-through bill your customers if you embed this in your own product.
Same payload opens a PR on either. Per-request token model so your end-users' credentials never leave the request boundary.
If your team has one or two engineers running Claude Code locally and the volume is a handful of tasks a week, build it yourself with the SDK. The SDK is genuinely good. The infra layer is solvable.
If you are an engineering org with multiple repos, multiple models in flight, an eng-lead asking for cost dashboards, a procurement team asking for SOC 2, and a product team asking to embed coding agents in your own SaaS, buy. The boring infrastructure is the only part of this stack that does not differentiate you. Spending three quarters of an engineer on a sandbox manager that does not move product is a bad trade.
That is what we sell. The Claude Code Agent SDK, wrapped in the production layer you would otherwise spend a quarter building.
Every plan is BYOK. Your Anthropic, OpenAI, or open-weights key. We charge for the agent loop, the sandbox, and the git push. Model tokens are billed by your provider, never marked up.
For trying it out on a side project.
For one developer shipping serious work.
For small teams building with the API in production.
For high-volume API embeds and enterprise contracts.
Tell us your team size, how you are running Claude Code today, and where the operational pain is. A human replies with a tailored walkthrough or, if you should just stay on the raw SDK, we will tell you that.