For platform teams running Claude Code

Claude Code Background Agent. Production-grade, for teams.

Anthropic gives you the Claude Code Agent SDK and a powerful Background Agents / Sub-Agents model. What they do not give you is the team-grade infrastructure to run it: per-team token budgets, multi-repo orchestration, audit trails, pass-through billing, and a streaming protocol that survives a Friday afternoon.

This page explains how Claude Code's Background Agent feature works, what shipping it to production for an engineering team actually takes, and where a managed wrapper earns its keep.

Request access See pricing

typescriptPOST a task → get a PR

// Running Claude Code background agents in production for a team
// means owning the boring parts:

//   [ ] Ephemeral per-task sandbox (cold-start in <5s, isolated FS)
//   [ ] Git clone-with-token, branch policy, atomic commit + push
//   [ ] Per-task token budget and dollar accounting (every task, every model)
//   [ ] Retry on 429, 5xx, timeout, network drop, mid-stream truncation
//   [ ] Streaming protocol that survives client disconnects
//   [ ] Per-team API keys with scoped permissions
//   [ ] Multi-repo orchestration (one task, N repos)
//   [ ] Audit trail (who ran what, when, on which repo, at what cost)
//   [ ] Pass-through billing if you embed this in your own product
//   [ ] GitHub and Bitbucket parity

// What you actually want to write:
const result = await orchestra.tasks.create({
  repo:   'acme/api',
  task:   'Bump terraform-aws-provider from 5.x to 6.x; update breaking calls',
  model:  'claude-sonnet-4-6',
  team:   'platform',
  budget: { maxUsd: 2.00 },
});

The feature

What are Claude Code background agents?

In v2.0.60 of Claude Code, Anthropic introduced background agents, alongside the Sub-Agents model. The idea: instead of Claude Code being a turn-based assistant you watch run in a terminal, it becomes a parallel development environment. You launch agents that work on independent tasks in the background, monitor them from one place, and let them finish on their own.

Combined with the Claude Code Agent SDK, this gives a developer everything they need to write a script that spawns an autonomous coding loop: tool definitions, the agent harness, MCP server support, system prompt caching. It is a powerful primitive.

Owning that primitive in production for an engineering team is a different problem.

The hard part

What "Claude Code background agent in production" actually means.

Ephemeral sandboxes, not local processes

Claude Code runs in the directory you launch it from. For a team, you need an isolated, ephemeral filesystem per task. That means a sandbox manager (think E2B, Modal, Fly Machines), cold-start tuning so users do not wait 30 seconds, and a cleanup loop so old sandboxes do not pile up.

Git plumbing that survives a flaky token

Clone with the user's token (which may rotate mid-run), check out a feature branch, watch the agent edit, atomically commit, push back. Detect when the push fails because the user's token was scoped wrong and surface a useful error, not a stack trace. Build the same path for GitHub and Bitbucket.

Retry the agent loop, not just the HTTP call

The SDK's default retry handles 429s well. Production needs more: 5xx exponential backoff, hard timeouts on time-to-first-byte (60s is too tight on big prompts), retry on network drops, surface mid-stream truncation as a typed error instead of a silent exit when the model hits its output cap.

Streaming that survives the client

A team UI streams every tool call and edit. The agent should not die because the user closed their laptop. A managed wrapper persists the event stream on the server, so reconnects pick up where they left off. The SDK does not give you this; you build it.

Per-team token economics

A platform team needs to enforce "this project gets $2 per task and 50,000 tokens per day." Anthropic's console gives you org-level rate limits, not per-project budgets with per-task caps and a refund path on failure. You build it.

Multi-repo, multi-task orchestration

"Run this dependency upgrade across 12 repos and open 12 PRs" is a normal Tuesday for a platform team. The Claude Code SDK is single-process, single-directory. Wrapping it into a fleet runner means a queue, idempotency, retries per task, and a dashboard.

Audit trail and observability

Every task: who triggered it, which repo, which model, prompt and completion tokens, dollar cost, success, failure mode. For SOC 2 conversations and for engineering-lead retros. This is a database, an admin UI, and a warehouse pipe you build once.

The managed wrapper

What you get when we own the boring parts.

Bring your own Anthropic key

Pass it per request. Never stored. Model tokens bill direct to your Anthropic account. Switch to Sonnet 4.5 today, 4.7 tomorrow, no migration.

Per-team API keys

Scoped per project, per env, per task type. Budget caps enforced server-side. Rotate without touching app code.

Sandbox infra, managed

Ephemeral per-task working directory with node, npm, git, jq, ripgrep. Cold-start under 5 seconds. We run it, you do not.

Resilient stream

SSE for every tool call, edit, commit. Persisted server-side so clients can disconnect and reconnect. Typed errors on truncation, timeout, push failure.

Audit + cost dashboards

Every task is a row. Who, when, repo, model, tokens, dollars. Pipe to your warehouse. Pass-through bill your customers if you embed this in your own product.

GitHub and Bitbucket parity

Same payload opens a PR on either. Per-request token model so your end-users' credentials never leave the request boundary.

Honest framing

Should you build it or buy it?

If your team has one or two engineers running Claude Code locally and the volume is a handful of tasks a week, build it yourself with the SDK. The SDK is genuinely good. The infra layer is solvable.

If you are an engineering org with multiple repos, multiple models in flight, an eng-lead asking for cost dashboards, a procurement team asking for SOC 2, and a product team asking to embed coding agents in your own SaaS, buy. The boring infrastructure is the only part of this stack that does not differentiate you. Spending three quarters of an engineer on a sandbox manager that does not move product is a bad trade.

That is what we sell. The Claude Code Agent SDK, wrapped in the production layer you would otherwise spend a quarter building.

Pricing

Pay per task. Bring your own key.

Every plan is BYOK. Your Anthropic, OpenAI, or open-weights key. We charge for the agent loop, the sandbox, and the git push. Model tokens are billed by your provider, never marked up.

Hobby

$0/month

For trying it out on a side project.

•20 tasks per month
•Bring your own model key
•1 user
•GitHub and Bitbucket
•Community support

Request access

Solo

$39/month

For one developer shipping serious work.

•500 tasks per month
•BYOK or pooled model credits
•1 user
•Webhook and streaming events
•Per-task receipts (token and dollar cost)
•Email support

Request access

Team

$149/month

For small teams building with the API in production.

•5,000 tasks per month
•Up to 10 users
•White-label per-request creds
•Org-level usage dashboards
•Priority support

Request access

Scale

Custom

For high-volume API embeds and enterprise contracts.

•Unlimited tasks
•Dedicated infra option
•Self-host option (EU or US)
•SSO and audit logs
•SLA and named contact

Talk to us

Request access

Tell us what you'd build.

Tell us your team size, how you are running Claude Code today, and where the operational pain is. A human replies with a tailored walkthrough or, if you should just stay on the raw SDK, we will tell you that.