Case: OpenAI

Analyze OpenAI's harness view through repo-readability, observability, sandboxing, runtime surface, and cleanup.

Key takeaways

OpenAI's core idea is that the agent's workplace is the whole repository plus its connected runtime surface, not the prompt window.
Read the case as two layers: make the repo and docs agent-readable, then standardize the runtime through MCP, skills, sandbox, hooks, approvals, and plugins.
A short AGENTS.md acts as a table of contents that fixes where the agent starts and what it trusts, with detail living in structured docs/.
Browser, logs, and metrics separate "code looks right" from "behavior is right," catching runtime failures text review misses.
Secure MCP Tunnel, remote approvals, and stale-doc cleanup manage entropy and keep private tools connected without public exposure.

The key idea in OpenAI's example is simple: the agent's workplace is not the prompt window. It is the whole repository and its connected runtime surface.

What to Watch

OpenAI treats the harness less like an instruction template and more like the combination of AGENTS.md + docs + browser + logs + metrics + sandbox + hooks + cleanup.

What the 2026 Updates Added

The February article emphasized making the repo agent-readable. Later Agents SDK and Codex updates made that view more concrete as platform primitives.

Update	Harness meaning
Agents SDK model-native harness	Agent loop, tool use, memory, and sandbox orchestration become SDK structure
Native sandbox execution	Files, commands, dependencies, and output directories are provided through a controlled workspace
MCP / skills / `AGENTS.md` / shell / `apply_patch`	Common primitives for frontier agent systems become shared surfaces
TypeScript sandbox agents + open-source harness	The pattern extends beyond Python experiments into web and TypeScript stacks
Codex remote connections / hooks	Long-running approval, redirection, validation, and logging persist across devices and environments
Secure MCP Tunnel	Private or on-prem MCP tools connect without direct public internet exposure
OpenAI Developers plugin for Codex	OpenAI Platform access, API key setup, and API troubleshooting become plugin surface

Read the OpenAI case as two layers:

Make the repo and docs agent-readable.
Standardize the runtime surface through MCP, skills, sandbox, hooks, approvals, and plugins.

Problems Solved

The agent does not know where to start in a large repo.
Architecture rules are scattered across documents.
Code looks correct, but UI or runtime behavior fails.
Stale docs keep poisoning future context.
Internal tools and sensitive data are needed, but execution and credential boundaries must stay separate.

Technical Mechanism

The harness does three things:

Constrains search order by defining first-read docs.
Connects execution observation through browser, logs, and metrics.
Manages entropy by removing stale docs.

Why This Is Engineering

Design element	Failure mode changed
Short `AGENTS.md`	Unstable starting point
Structured `docs/`	Important design information depends on search luck
MCP / skills	Tools and knowledge are exposed without giant prompts
sandbox / Manifest	Execution environment, inputs, outputs, and permissions become predictable
browser / logs	"Code looks right" is separated from "behavior is right"
hooks / approvals	Lifecycle validation, logging, and human intervention are automated
plugin	Repeated provider setup and troubleshooting become installable execution paths
cleanup	Stale docs stop contaminating the next session

Minimal Example

# AGENTS.md

## Start Here
- Read `docs/architecture.md`
- Read `docs/invariants.md`
- Then inspect the relevant app directory

## Non-negotiables
- Do not bypass browser QA for UI changes
- Run lint/typecheck before finishing
- If schema or auth changes appear, escalate to human review

docs/
├── architecture.md
├── invariants.md
├── runbooks.md
└── glossary.md

The value is not document volume. It is fixing where the agent starts and what it trusts.

What to Borrow

Borrow	Why
Use `AGENTS.md` as a short TOC	Starting path matters more than background length
Split `docs/invariants.md`	Hard architecture rules must be easy to find
Govern MCP / skills / hooks	Tool access and lifecycle automation drift like docs
Standardize provider setup plugin	API key creation, platform access, and troubleshooting should not depend on personal memory
Design sandbox boundaries by task type	Sensitive data, dependencies, and side effects need control
Promote browser QA	"Looks implemented" and "works" are different
Schedule stale-doc cleanup	Input quality is maintained over time

Translation to Your Team

Ask:

What should the agent read in the first three minutes?
Which MCP, skills, shell, and filesystem permissions are open in the sandbox?
Which browser/log/test checks are mandatory?
Which lifecycle events should hooks and remote approvals capture?
Which provider setup and API troubleshooting steps should become plugin or skill?
Who deletes stale docs, and when?

References

OpenAI, Harness Engineering, 2026-02-11
OpenAI, The next evolution of the Agents SDK, 2026-04-15
OpenAI API Changelog, 2026-05-06 / 2026-05-19
OpenAI, Work with Codex from anywhere, 2026-05-14
OpenAI, OpenAI Developers plugin for Codex, 2026-05-07
OpenAI Codex Hooks / Remote Connections docs, read baseline 2026-05-23

Key takeaways

OpenAI's core idea is that the agent's workplace is the whole repository plus its connected runtime surface, not the prompt window.
Read the case as two layers: make the repo and docs agent-readable, then standardize the runtime through MCP, skills, sandbox, hooks, approvals, and plugins.
A short AGENTS.md acts as a table of contents that fixes where the agent starts and what it trusts, with detail living in structured docs/.
Browser, logs, and metrics separate "code looks right" from "behavior is right," catching runtime failures text review misses.
Secure MCP Tunnel, remote approvals, and stale-doc cleanup manage entropy and keep private tools connected without public exposure.

The key idea in OpenAI's example is simple: the agent's workplace is not the prompt window. It is the whole repository and its connected runtime surface.

What to Watch

OpenAI treats the harness less like an instruction template and more like the combination of AGENTS.md + docs + browser + logs + metrics + sandbox + hooks + cleanup.

What the 2026 Updates Added

The February article emphasized making the repo agent-readable. Later Agents SDK and Codex updates made that view more concrete as platform primitives.

Update	Harness meaning
Agents SDK model-native harness	Agent loop, tool use, memory, and sandbox orchestration become SDK structure
Native sandbox execution	Files, commands, dependencies, and output directories are provided through a controlled workspace
MCP / skills / `AGENTS.md` / shell / `apply_patch`	Common primitives for frontier agent systems become shared surfaces
TypeScript sandbox agents + open-source harness	The pattern extends beyond Python experiments into web and TypeScript stacks
Codex remote connections / hooks	Long-running approval, redirection, validation, and logging persist across devices and environments
Secure MCP Tunnel	Private or on-prem MCP tools connect without direct public internet exposure
OpenAI Developers plugin for Codex	OpenAI Platform access, API key setup, and API troubleshooting become plugin surface

Read the OpenAI case as two layers:

Make the repo and docs agent-readable.
Standardize the runtime surface through MCP, skills, sandbox, hooks, approvals, and plugins.

Problems Solved

The agent does not know where to start in a large repo.
Architecture rules are scattered across documents.
Code looks correct, but UI or runtime behavior fails.
Stale docs keep poisoning future context.
Internal tools and sensitive data are needed, but execution and credential boundaries must stay separate.

Technical Mechanism

The harness does three things:

Constrains search order by defining first-read docs.
Connects execution observation through browser, logs, and metrics.
Manages entropy by removing stale docs.

Why This Is Engineering

Design element	Failure mode changed
Short `AGENTS.md`	Unstable starting point
Structured `docs/`	Important design information depends on search luck
MCP / skills	Tools and knowledge are exposed without giant prompts
sandbox / Manifest	Execution environment, inputs, outputs, and permissions become predictable
browser / logs	"Code looks right" is separated from "behavior is right"
hooks / approvals	Lifecycle validation, logging, and human intervention are automated
plugin	Repeated provider setup and troubleshooting become installable execution paths
cleanup	Stale docs stop contaminating the next session

Minimal Example

# AGENTS.md

## Start Here
- Read `docs/architecture.md`
- Read `docs/invariants.md`
- Then inspect the relevant app directory

## Non-negotiables
- Do not bypass browser QA for UI changes
- Run lint/typecheck before finishing
- If schema or auth changes appear, escalate to human review

docs/
├── architecture.md
├── invariants.md
├── runbooks.md
└── glossary.md

The value is not document volume. It is fixing where the agent starts and what it trusts.

What to Borrow

Borrow	Why
Use `AGENTS.md` as a short TOC	Starting path matters more than background length
Split `docs/invariants.md`	Hard architecture rules must be easy to find
Govern MCP / skills / hooks	Tool access and lifecycle automation drift like docs
Standardize provider setup plugin	API key creation, platform access, and troubleshooting should not depend on personal memory
Design sandbox boundaries by task type	Sensitive data, dependencies, and side effects need control
Promote browser QA	"Looks implemented" and "works" are different
Schedule stale-doc cleanup	Input quality is maintained over time

Translation to Your Team

Ask:

What should the agent read in the first three minutes?
Which MCP, skills, shell, and filesystem permissions are open in the sandbox?
Which browser/log/test checks are mandatory?
Which lifecycle events should hooks and remote approvals capture?
Which provider setup and API troubleshooting steps should become plugin or skill?
Who deletes stale docs, and when?

References

OpenAI, Harness Engineering, 2026-02-11
OpenAI, The next evolution of the Agents SDK, 2026-04-15
OpenAI API Changelog, 2026-05-06 / 2026-05-19
OpenAI, Work with Codex from anywhere, 2026-05-14
OpenAI, OpenAI Developers plugin for Codex, 2026-05-07
OpenAI Codex Hooks / Remote Connections docs, read baseline 2026-05-23

What the 2026 Updates Added

Problems Solved

Technical Mechanism

Why This Is Engineering

Minimal Example

What to Borrow

Translation to Your Team

References

On This Page

Case: OpenAI

What the 2026 Updates Added

Problems Solved

Technical Mechanism

Why This Is Engineering

Minimal Example

What to Borrow

Translation to Your Team

References

On This Page