Headless and CI/CD

Run Claude Code in non-interactive workflows with clear inputs, outputs, and guardrails.

Key takeaways

claude -p (--print) runs Claude Code non-interactively as the Agent SDK via the CLI; it works best for narrow tasks with machine-readable or easily reviewable output.
Start scripted runs with --bare so they skip auto-discovery of hooks, skills, MCP, and CLAUDE.md and behave the same everywhere — Anthropic's recommended (and future-default) CI mode.
Shape output with --output-format (text, json, stream-json) and force a schema with --json-schema; pipe input via stdin (capped at 10MB as of v2.1.128).
Recent v2.1.181~v2.1.190 builds improved structured-output determinism and non-interactive fallback behavior, but CI should still validate the schema and preserve raw output on failure.
Guard CI with least-privilege flags: --allowedTools, --permission-mode dontAsk, --max-turns, and --max-budget-usd; avoid bypassPermissions unless the environment is fully trusted.
Background sessions (--bg) and agent view manage long-running jobs, but slash commands like /run and /verify are interactive-only and cannot be invoked in -p mode.

Headless Claude Code works best when the task is narrow and the expected output is machine-readable or easy to review.

In the official documentation this mode is now called "Run Claude Code programmatically." Adding the -p (or --print) flag to any claude command runs it non-interactively, and Anthropic frames this as using the Agent SDK via the CLI — the same agent loop, tools, and context management that power interactive Claude Code. For full programmatic control with native message objects and tool approval callbacks, use the Python or TypeScript Agent SDK packages instead.

Agent SDK credit (effective June 15, 2026)

Starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive usage limits. Plan CI budgets accordingly. API key, Bedrock, Vertex, and Foundry usage is billed through those providers as usual.

Good Headless Tasks

Generate a changelog summary from a known diff.
Run a focused code review and return findings.
Update a repetitive documentation section.
Classify test failures with logs as input.
Produce a patch in a temporary branch for human review.

Avoid headless execution for broad architectural work unless a human will review each step.

Invocation Contract

Define input, output, and permissions:

claude -p "Review this diff for locale routing regressions. Return JSON findings only."

For scripts, prefer stable flags and structured output. Record the model, version, and command in CI logs so failures can be reproduced.

Start scripts in bare mode

By default claude -p loads the same context an interactive session would — hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md from the working directory and ~/.claude. That makes scripted results depend on whatever happens to be configured on the machine. Add --bare so the run starts faster and produces the same result everywhere by skipping all of that auto-discovery:

claude --bare -p "Summarize this file" --allowedTools "Read"

In bare mode Claude only has the Bash, file read, and file edit tools, and only flags you pass explicitly take effect. Load context deliberately with --append-system-prompt / --append-system-prompt-file, --settings, --mcp-config, --agents, or --plugin-dir / --plugin-url. Bare mode also skips OAuth and keychain reads, so authentication must come from ANTHROPIC_API_KEY or an apiKeyHelper in the JSON passed to --settings (Bedrock, Vertex, and Foundry use their usual provider credentials).

Bare mode is the recommended CI default

Anthropic recommends --bare for scripted and SDK calls, and it will become the default for -p in a future release. Adopting it now keeps CI behavior stable across that change.

Use --safe-mode for troubleshooting broken customization rather than for reproducible CI. Unlike --bare, safe mode keeps authentication, model selection, built-in tools, and permissions working, but disables CLAUDE.md, skills, plugins, hooks, MCP servers, custom commands and agents, output styles, workflows, custom themes, keybindings, status line, file-suggestion commands, LSP servers, and auto-memory. It is useful when Fable 5 fallback, hook behavior, or plugin loading differs from a clean session.

Correlate runs with a session id

Use --session-id (which must be a valid UUID) when CI needs to correlate Claude Code output with a job, PR, or retry id:

claude -p "Summarize release risk as JSON." \
  --session-id "00000000-0000-4000-8000-000000000123" \
  --output-format json

For multi-step jobs, continue or resume instead of starting fresh: --continue (-c) loads the most recent conversation in the current directory, and --resume (-r) resumes a specific session by ID or name. Capture the id from a prior JSON result to resume it later:

session_id=$(claude -p "Start a review" --output-format json | jq -r '.session_id')
claude -p "Continue that review" --resume "$session_id"

Pipe data in, structure data out

Non-interactive mode reads stdin, so you can pipe input in and redirect output like any CLI tool. Piping a diff avoids needing Bash permission to read it:

git diff main | claude -p "you are a typo linter. report filename:line then the issue. return nothing else."

Piped stdin cap

As of Claude Code v2.1.128, piped stdin is capped at 10MB. Exceeding the cap exits with a clear error and a non-zero status. For larger inputs, write to a file and reference the file path in the prompt.

Control the output shape with --output-format:

text (default): plain text.
json: structured JSON with the text in result, plus session_id, usage, and total_cost_usd (with a per-model cost breakdown) so callers can track spend per invocation.
stream-json: newline-delimited JSON events for real-time streaming (use with --verbose, and --include-partial-messages for token deltas).

To force schema-conforming output, add --json-schema with a JSON Schema definition; the structured result lands in the structured_output field:

claude -p "Extract the main function names from auth.py" \
  --output-format json \
  --json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}},"required":["functions"]}'

Current v2.1.181~v2.1.190 releases make schema output more deterministic in headless runs. Treat that as a reliability improvement, not as a replacement for validation: check that structured_output exists, validate it against your schema, and keep the raw JSON or stderr as a CI artifact when parsing fails.

Background sessions

Use background sessions when a job is long-running and should be attachable. Pass --bg to start a session that goes straight to the background and returns immediately, printing the session's short ID and the commands for managing it:

claude --bg --name "flaky-test-fix" "Investigate SettingsChangeDetector flakes"
claude agents          # open agent view (interactive terminal)
claude agents --json   # print live sessions as JSON for scripting
claude logs <id>       # print a session's recent output
claude attach <id>     # attach to a session in this terminal
claude stop <id>       # stop a session (also: claude kill)

--name (also -n) sets the display name shown in agent view; without it the name is generated from the prompt. Manage sessions from the shell with claude agents, claude logs <id>, claude attach <id>, claude stop <id>, claude respawn <id>, and claude rm <id>. Each session's short ID is its directory name under ~/.claude/jobs/.

Claude Code v2.1.170 fixed a transcript persistence issue where sessions launched from the VS Code integrated terminal, or shells inheriting Claude Code environment variables, could fail to appear in --resume. If a teammate reports missing sessions from those environments, verify they are on v2.1.170 or later before treating it as user error.

Use claude --bg --exec '<command>' when you want a shell command to appear in Agent View as a PTY-backed job without invoking a model. Its captured output stays in memory (not written to disk) and cleans up about five minutes after the command exits, so read it before then:

claude --bg --exec 'pytest -x'

Agent view is a research preview

Agent view requires Claude Code v2.1.139 or later (claude agents --cwd requires v2.1.141). Before editing files, a background session moves into an isolated git worktree under .claude/worktrees/; set worktree.bgIsolation to "none" (v2.1.143+) to edit the working copy directly. Background sessions run locally, consume your subscription quota per session, and are preserved across sleep but stop on machine shutdown.

CI Guardrails

Run in a clean checkout, and start with --bare so the run does not pick up machine-local config.
Use least-privilege credentials. For unattended CI, generate a long-lived token with claude setup-token instead of relying on interactive OAuth.
Scope tools explicitly. --allowedTools lists tools that run without prompting (using permission rule syntax, e.g. "Bash(git diff *)"), and --tools restricts which built-in tools exist at all.
For a locked-down baseline, pass --permission-mode dontAsk, which denies anything not in your permissions.allow rules or the read-only command set. acceptEdits auto-approves file writes plus common filesystem commands but still aborts on other shell or network calls unless allowed. Avoid bypassPermissions (--dangerously-skip-permissions) in CI unless the environment is fully trusted.
Bound the run with --max-turns N (exits with an error at the limit) and --max-budget-usd N (stops spending past the cap). Both are print-mode only.
Block production writes unless explicitly approved.
Store generated patches as artifacts.
Require human review before merge.
Fail closed when configuration cannot be loaded.
Pin the model for reproducible automation instead of relying on moving aliases, and consider --fallback-model so a retired or overloaded default does not break the pipeline.
Set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 in locked-down environments where nonessential network traffic is not allowed. This is equivalent to setting DISABLE_AUTOUPDATER, DISABLE_BUG_COMMAND, DISABLE_ERROR_REPORTING, and DISABLE_TELEMETRY together.
For sandboxed CI, manage the newer sandbox.credentials policy alongside sandbox.failIfUnavailable and network allow/deny rules. Current v2.1.181~v2.1.190 releases also improve destructive-command approval and native Windows PowerShell sandbox behavior, so test both Bash and PowerShell paths when Windows runners are in scope.
On self-hosted runners, use the v2.1.169 post-session lifecycle hook when you need to snapshot uncommitted work or export logs after a Claude session ends but before the workspace is deleted. The child-process SIGTERM-to-SIGKILL window is configurable; the default remains five seconds.

When streaming, the system/init event reports the model, tools, MCP servers, and loaded plugins; its plugin_errors field lets CI fail when a plugin did not load. A system/api_retry event is emitted before a retryable request is retried, so you can surface retry progress.

Long-running MCP tools in headless jobs should set CLAUDE_CODE_MCP_TOOL_IDLE_TIMEOUT deliberately instead of relying on the default. Recent releases also reduced misleading MCP status reporting, but automation should still judge MCP health from command exit codes and logs rather than the interactive /mcp view.

Run And Verify Skills

Current Claude Code includes the /run and /verify bundled skills, plus /run-skill-generator, for app-level checks. /run and /verify infer the launch from your project type (CLI, server, TUI, browser-driven) and from your README, package.json, or Makefile. They are useful when a change must be observed in a running app rather than inferred from tests alone.

/run-skill-generator runs once per project (and again when the build or launch process changes): it gets the app running from a clean environment, captures what worked, and commits it as a per-project skill under .claude/skills/run-<name>/ so later runs follow the recorded recipe instead of rediscovering it.

Slash commands are interactive-only

User-invoked skills and built-in commands such as /run, /verify, and /code-review are only available in interactive mode. In -p (headless) mode they cannot be called as slash commands — describe the task you want accomplished in the prompt instead. In CI, only rely on these after the project-specific run skill can build and launch from a clean checkout.

Output Review

Headless output should answer:

What files or commits were inspected?
What exact issue was found?
What evidence supports it?
What command or test confirms the result?

If the output cannot support those questions, narrow the prompt or add structured reporting.

References

Key takeaways

claude -p (--print) runs Claude Code non-interactively as the Agent SDK via the CLI; it works best for narrow tasks with machine-readable or easily reviewable output.
Start scripted runs with --bare so they skip auto-discovery of hooks, skills, MCP, and CLAUDE.md and behave the same everywhere — Anthropic's recommended (and future-default) CI mode.
Shape output with --output-format (text, json, stream-json) and force a schema with --json-schema; pipe input via stdin (capped at 10MB as of v2.1.128).
Recent v2.1.181~v2.1.190 builds improved structured-output determinism and non-interactive fallback behavior, but CI should still validate the schema and preserve raw output on failure.
Guard CI with least-privilege flags: --allowedTools, --permission-mode dontAsk, --max-turns, and --max-budget-usd; avoid bypassPermissions unless the environment is fully trusted.
Background sessions (--bg) and agent view manage long-running jobs, but slash commands like /run and /verify are interactive-only and cannot be invoked in -p mode.

Headless Claude Code works best when the task is narrow and the expected output is machine-readable or easy to review.

Agent SDK credit (effective June 15, 2026)

Good Headless Tasks

Generate a changelog summary from a known diff.
Run a focused code review and return findings.
Update a repetitive documentation section.
Classify test failures with logs as input.
Produce a patch in a temporary branch for human review.

Avoid headless execution for broad architectural work unless a human will review each step.

Invocation Contract

Define input, output, and permissions:

claude -p "Review this diff for locale routing regressions. Return JSON findings only."

For scripts, prefer stable flags and structured output. Record the model, version, and command in CI logs so failures can be reproduced.

Start scripts in bare mode

claude --bare -p "Summarize this file" --allowedTools "Read"

Bare mode is the recommended CI default

Anthropic recommends --bare for scripted and SDK calls, and it will become the default for -p in a future release. Adopting it now keeps CI behavior stable across that change.

Correlate runs with a session id

Use --session-id (which must be a valid UUID) when CI needs to correlate Claude Code output with a job, PR, or retry id:

claude -p "Summarize release risk as JSON." \
  --session-id "00000000-0000-4000-8000-000000000123" \
  --output-format json

session_id=$(claude -p "Start a review" --output-format json | jq -r '.session_id')
claude -p "Continue that review" --resume "$session_id"

Pipe data in, structure data out

Non-interactive mode reads stdin, so you can pipe input in and redirect output like any CLI tool. Piping a diff avoids needing Bash permission to read it:

git diff main | claude -p "you are a typo linter. report filename:line then the issue. return nothing else."

Piped stdin cap

Control the output shape with --output-format:

text (default): plain text.
json: structured JSON with the text in result, plus session_id, usage, and total_cost_usd (with a per-model cost breakdown) so callers can track spend per invocation.
stream-json: newline-delimited JSON events for real-time streaming (use with --verbose, and --include-partial-messages for token deltas).

To force schema-conforming output, add --json-schema with a JSON Schema definition; the structured result lands in the structured_output field:

claude -p "Extract the main function names from auth.py" \
  --output-format json \
  --json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}},"required":["functions"]}'

Background sessions

claude --bg --name "flaky-test-fix" "Investigate SettingsChangeDetector flakes"
claude agents          # open agent view (interactive terminal)
claude agents --json   # print live sessions as JSON for scripting
claude logs <id>       # print a session's recent output
claude attach <id>     # attach to a session in this terminal
claude stop <id>       # stop a session (also: claude kill)

claude --bg --exec 'pytest -x'

Agent view is a research preview

CI Guardrails

Run in a clean checkout, and start with --bare so the run does not pick up machine-local config.
Use least-privilege credentials. For unattended CI, generate a long-lived token with claude setup-token instead of relying on interactive OAuth.
Scope tools explicitly. --allowedTools lists tools that run without prompting (using permission rule syntax, e.g. "Bash(git diff *)"), and --tools restricts which built-in tools exist at all.
For a locked-down baseline, pass --permission-mode dontAsk, which denies anything not in your permissions.allow rules or the read-only command set. acceptEdits auto-approves file writes plus common filesystem commands but still aborts on other shell or network calls unless allowed. Avoid bypassPermissions (--dangerously-skip-permissions) in CI unless the environment is fully trusted.
Bound the run with --max-turns N (exits with an error at the limit) and --max-budget-usd N (stops spending past the cap). Both are print-mode only.
Block production writes unless explicitly approved.
Store generated patches as artifacts.
Require human review before merge.
Fail closed when configuration cannot be loaded.
Pin the model for reproducible automation instead of relying on moving aliases, and consider --fallback-model so a retired or overloaded default does not break the pipeline.
Set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 in locked-down environments where nonessential network traffic is not allowed. This is equivalent to setting DISABLE_AUTOUPDATER, DISABLE_BUG_COMMAND, DISABLE_ERROR_REPORTING, and DISABLE_TELEMETRY together.
For sandboxed CI, manage the newer sandbox.credentials policy alongside sandbox.failIfUnavailable and network allow/deny rules. Current v2.1.181~v2.1.190 releases also improve destructive-command approval and native Windows PowerShell sandbox behavior, so test both Bash and PowerShell paths when Windows runners are in scope.
On self-hosted runners, use the v2.1.169 post-session lifecycle hook when you need to snapshot uncommitted work or export logs after a Claude session ends but before the workspace is deleted. The child-process SIGTERM-to-SIGKILL window is configurable; the default remains five seconds.

Run And Verify Skills

Slash commands are interactive-only

Output Review

Headless output should answer:

What files or commits were inspected?
What exact issue was found?
What evidence supports it?
What command or test confirms the result?

If the output cannot support those questions, narrow the prompt or add structured reporting.

Good Headless Tasks

Invocation Contract

Start scripts in bare mode

Correlate runs with a session id

Pipe data in, structure data out

Background sessions

CI Guardrails

Run And Verify Skills

Output Review

References

On This Page

Headless and CI/CD

Good Headless Tasks

Invocation Contract

Start scripts in bare mode

Correlate runs with a session id

Pipe data in, structure data out

Background sessions

CI Guardrails

Run And Verify Skills

Output Review

References

On This Page