Verification Report
Link, consistency, source, and static validation report for the Harness Engineering handbook.
This document records structure, source, cross-link, and handbook app validation for the English Harness Engineering handbook.
Verification Baseline
2026-05-23
Scope
| Item | Standard |
|---|---|
| Document structure | meta.json matches actual MDX files |
| Content consistency | Core claims and chapter flow do not conflict |
| External evidence | OpenAI, Anthropic, Toss, gstack, and revfactory/harness claims are checked |
| Cross-links | Related handbook links are valid |
| App validation | handbook registry, typecheck, and build pass |
Method
- Compared
apps/handbook/content/books/en/harness-engineering/meta.jsonwith the MDX file list. - Compared chapter claims against the source material.
- Checked related links to LLMOps/AgentOps, Codex, Claude Code, orchestration, and documentation books.
- Used OpenAI developer docs MCP and official OpenAI sources for OpenAI items.
- Cross-checked Anthropic engineering/news sources and Claude Code / Managed Agents official docs.
- Checked
gstackandrevfactory/harnessREADME state and GitHub metadata during the Korean baseline update. - Ran static validation.
Result Summary
| Item | Result |
|---|---|
meta.json and MDX files | 23 pages aligned |
| Structure flow | Pass |
| External evidence connection | Pass |
| Cross-links | Pass |
pnpm --filter handbook run check:books-registry | Pass |
pnpm --filter handbook run typecheck | Pass |
pnpm --filter handbook run build | Pass |
Core Sources
| Source | Date | Use in this book |
|---|---|---|
| OpenAI, Harness Engineering | 2026-02-11 | agent-readable repo, short AGENTS.md, structured docs, observability, garbage collection |
| OpenAI, The next evolution of the Agents SDK | 2026-04-15 | model-native harness, native sandbox execution, MCP/skills/AGENTS.md/shell/apply_patch primitives |
| OpenAI API Changelog | 2026-05-06 / 2026-05-19 | TypeScript sandbox agents, open-source harness, Secure MCP Tunnel |
| OpenAI Developers plugin for Codex | 2026-05-07 | OpenAI Platform access, API key setup, troubleshooting as plugin surface |
| OpenAI, Work with Codex from anywhere | 2026-05-14 | mobile/remote connection, approvals, hooks, enterprise environment |
| OpenAI Agents SDK / Sandbox / Codex docs | Read baseline 2026-05-23 | sandbox capability, hooks lifecycle, remote connections |
| Anthropic, Harness design for long-running application development | 2026-03-24 | planner/generator/evaluator and load-bearing scaffolding |
| Anthropic, Claude Code auto mode | 2026-03-25 | prompt-injection probe, transcript classifier, trust boundary, denial fallback |
| Claude Code permission / auto mode docs | Read baseline 2026-05-23 | permission modes, protected paths, classifier order, trusted infrastructure |
| Anthropic, Scaling Managed Agents | 2026-04-08 | session/harness/sandbox split, durable event log, credential vault, MCP proxy |
| Claude Managed Agents overview / MCP connector docs | Read baseline 2026-05-23 | agents, environments, sessions, events, MCP auth/vault separation |
| Anthropic, Agents for financial services | 2026-05-05 | domain templates, skills/connectors/subagents, per-tool permissions, audit log |
| Toss harness article | 2026-02-26 | frictionless harness, executable SSOT, domain layer, HITL |
| gstack README | Read baseline 2026-05-23 | specialists, power tools, agent hosts, team mode, QA, checkpoint, learning |
| revfactory/harness README | Read baseline 2026-05-23 | L3 Meta-Factory, Team-Architecture Factory, architecture patterns, A/B caveat |
Synthesized Claims
| Claim | Evidence basis |
|---|---|
| Harnesses are work-system design, not prompt tricks | OpenAI repo/observability + Anthropic evaluation + Toss system rollout |
| Generic harnesses are starting points | Toss domain layers + gstack make-it-yours posture + revfactory domain teams |
| Operations and cleanup are part of the harness | OpenAI entropy and doc gardening view |
| Harness primitives are being productized, but domain design remains | OpenAI primitives + gstack/revfactory domain workflows |
| Auto approval is a policy layer, not a human-review replacement | Anthropic auto mode classifier/trust-boundary structure |
| Long-running agent runtime should use durable session logs | Anthropic Managed Agents session/harness/sandbox split |
Limitations
Scope
This report reflects the 2026-05-23 source baseline. External tools and docs can change quickly; interpretation
changes should be recorded in updates.mdx.