Case: OpenAI
Analyze OpenAI's harness view through repo-readability, observability, sandboxing, runtime surface, and cleanup.
Key takeaways
- OpenAI's core idea is that the agent's workplace is the whole repository plus its connected runtime surface, not the prompt window.
- Read the case as two layers: make the repo and docs agent-readable, then standardize the runtime through MCP, skills, sandbox, hooks, approvals, and plugins.
- A short AGENTS.md acts as a table of contents that fixes where the agent starts and what it trusts, with detail living in structured docs/.
- Browser, logs, and metrics separate "code looks right" from "behavior is right," catching runtime failures text review misses.
- Secure MCP Tunnel, remote approvals, and stale-doc cleanup manage entropy and keep private tools connected without public exposure.
The key idea in OpenAI's example is simple: the agent's workplace is not the prompt window. It is the whole repository and its connected runtime surface.
What to Watch
OpenAI treats the harness less like an instruction template and more like the combination of
AGENTS.md + docs + browser + logs + metrics + sandbox + hooks + cleanup.
What the 2026 Updates Added
The February article emphasized making the repo agent-readable. Later Agents SDK and Codex updates made that view more concrete as platform primitives.
| Update | Harness meaning |
|---|---|
| Agents SDK model-native harness | Agent loop, tool use, memory, and sandbox orchestration become SDK structure |
| Native sandbox execution | Files, commands, dependencies, and output directories are provided through a controlled workspace |
MCP / skills / AGENTS.md / shell / apply_patch | Common primitives for frontier agent systems become shared surfaces |
| TypeScript sandbox agents + open-source harness | The pattern extends beyond Python experiments into web and TypeScript stacks |
| Codex remote connections / hooks | Long-running approval, redirection, validation, and logging persist across devices and environments |
| Secure MCP Tunnel | Private or on-prem MCP tools connect without direct public internet exposure |
| OpenAI Developers plugin for Codex | OpenAI Platform access, API key setup, and API troubleshooting become plugin surface |
Read the OpenAI case as two layers:
- Make the repo and docs agent-readable.
- Standardize the runtime surface through MCP, skills, sandbox, hooks, approvals, and plugins.
Problems Solved
- The agent does not know where to start in a large repo.
- Architecture rules are scattered across documents.
- Code looks correct, but UI or runtime behavior fails.
- Stale docs keep poisoning future context.
- Internal tools and sensitive data are needed, but execution and credential boundaries must stay separate.
Technical Mechanism
The harness does three things:
- Constrains search order by defining first-read docs.
- Connects execution observation through browser, logs, and metrics.
- Manages entropy by removing stale docs.
Why This Is Engineering
| Design element | Failure mode changed |
|---|---|
Short AGENTS.md | Unstable starting point |
Structured docs/ | Important design information depends on search luck |
| MCP / skills | Tools and knowledge are exposed without giant prompts |
| sandbox / Manifest | Execution environment, inputs, outputs, and permissions become predictable |
| browser / logs | "Code looks right" is separated from "behavior is right" |
| hooks / approvals | Lifecycle validation, logging, and human intervention are automated |
| plugin | Repeated provider setup and troubleshooting become installable execution paths |
| cleanup | Stale docs stop contaminating the next session |
Minimal Example
# AGENTS.md
## Start Here
- Read `docs/architecture.md`
- Read `docs/invariants.md`
- Then inspect the relevant app directory
## Non-negotiables
- Do not bypass browser QA for UI changes
- Run lint/typecheck before finishing
- If schema or auth changes appear, escalate to human reviewdocs/
├── architecture.md
├── invariants.md
├── runbooks.md
└── glossary.mdThe value is not document volume. It is fixing where the agent starts and what it trusts.
What to Borrow
| Borrow | Why |
|---|---|
Use AGENTS.md as a short TOC | Starting path matters more than background length |
Split docs/invariants.md | Hard architecture rules must be easy to find |
| Govern MCP / skills / hooks | Tool access and lifecycle automation drift like docs |
| Standardize provider setup plugin | API key creation, platform access, and troubleshooting should not depend on personal memory |
| Design sandbox boundaries by task type | Sensitive data, dependencies, and side effects need control |
| Promote browser QA | "Looks implemented" and "works" are different |
| Schedule stale-doc cleanup | Input quality is maintained over time |
Translation to Your Team
Ask:
- What should the agent read in the first three minutes?
- Which MCP, skills, shell, and filesystem permissions are open in the sandbox?
- Which browser/log/test checks are mandatory?
- Which lifecycle events should hooks and remote approvals capture?
- Which provider setup and API troubleshooting steps should become plugin or skill?
- Who deletes stale docs, and when?
References
- OpenAI,
Harness Engineering, 2026-02-11 - OpenAI,
The next evolution of the Agents SDK, 2026-04-15 - OpenAI API Changelog, 2026-05-06 / 2026-05-19
- OpenAI,
Work with Codex from anywhere, 2026-05-14 - OpenAI,
OpenAI Developers plugin for Codex, 2026-05-07 - OpenAI Codex Hooks / Remote Connections docs, read baseline 2026-05-23