External Case Comparison

Compare OpenAI, Anthropic, Toss, gstack, and revfactory/harness by input, state, verification, and rollout.

Key takeaways

The useful question across cases is not "who is right?" but which distinct problem each example solves.
OpenAI tackles search cost and doc entropy (knowledge architecture); Anthropic tackles self-evaluation bias and runtime coupling (control-loop engineering).
Toss solves team distribution and reproducibility; gstack solves parallel work without chaos; revfactory solves repeating harness design itself.
The comparison table aligns each case along core input, externalized state, verification interface, and approval/rollout.
Shared conclusion: a better work environment beats a better single prompt, and long work needs external state, evaluation loops, and operations.

When studying harness engineering, the question is not "who is right?" The useful question is: which problem is each example solving?

OpenAI

Repo-readable systems, observability, runtime surface, and cleanup.

Anthropic

Planner/evaluator, managed runtime, permission classifier, and handoff.

Toss

Executable SSOT, domain layers, and frictionless team rollout.

gstack

Sprint, command surface, QA, and release gate.

revfactory/harness

Domain-first harness generation and team architecture.

Comparison Table

Case	Core input	Externalized state	Verification interface	Approval / rollout	Strongest message
OpenAI	`AGENTS.md`, `docs/`, MCP, skills, sandbox	Docs, code, observability, workspace manifest	Browser, logs, metrics, hooks	Cleanup, remote approval, Secure MCP, plugins	Repo + runtime surface is the harness
Anthropic	Task contract, permission policy	Durable session log, planner/builder/evaluator handoff	Evaluator, QA, permission classifier	Retry budget, handoff, managed runtime	Separate load-bearing scaffolding and runtime boundaries
Toss	Global/domain/local rules	Workflow and SSOT	Executable docs and procedures	Domain HITL	Push harnesses into executable team systems
gstack	Sprint phase, command, host adapter	Phase artifacts, checkpoint, learning	Review, test, ship, browser/device QA	Team mode, auto-update, release gate	Run it like a software factory
revfactory/harness	Domain analysis	Agent/skill files, team architecture	Validation and testing, A/B pilot	Generated harness refinement	A harness can generate a harness

Which Technical Problem Is Being Solved?

Case	Problem	Technical reading
OpenAI	Search cost and documentation entropy	Knowledge architecture
Anthropic	Long-running self-evaluation bias and runtime coupling	Control-loop and runtime-boundary engineering
Toss	Team distribution and reproducibility	Workflow distribution
gstack	Parallel work without chaos	Production pipeline design
revfactory/harness	Repeating harness design itself	Meta-architecture generation

2026-05-23 Update Points

Case	Latest addition
OpenAI	Agents SDK model-native harness, sandbox execution, TypeScript sandbox agents, Secure MCP Tunnel, Codex remote/hooks, Developers plugin
Anthropic	Claude Code auto mode prompt-injection probe and transcript classifier, Managed Agents session-harness-sandbox split, finance agent templates
gstack	23 specialists, 8 power tools, 10 AI coding agent hosts, team mode auto-update, iOS live-device QA, checkpoint and learning flows
revfactory/harness	v1.2.0 L3 Meta-Factory / Team-Architecture Factory, marketplace install, Harness 100, author-measured A/B results with caveat

Detailed Interpretation

Recommended Order

Need	Read first
Improve repo and docs	OpenAI
Design evaluation loops and retry budgets	Anthropic
Roll out team workflows	Toss
Build opinionated sprint pipelines	gstack
Generate domain-specific harnesses	revfactory/harness

Shared Conclusion

A better work environment matters more than a better single prompt.
Longer work requires external state and evaluation loops.
Team adoption requires executable workflows, commands, and approvals.
Generic templates are starting points; domain-specific harnesses create performance.
Harnesses must be operated and cleaned up.

References

OpenAI, "Harness Engineering", 2026-02-11 https://openai.com/ko-KR/index/harness-engineering/
OpenAI, "The next evolution of the Agents SDK", 2026-04-15 https://openai.com/index/the-next-evolution-of-the-agents-sdk/
OpenAI, "Work with Codex from anywhere", 2026-05-14 https://openai.com/index/work-with-codex-from-anywhere/
OpenAI API Changelog https://developers.openai.com/api/docs/changelog
OpenAI Developers plugin for Codex https://developers.openai.com/learn/developers-codex-plugin
Anthropic harness design https://www.anthropic.com/engineering/harness-design-long-running-apps
Anthropic Claude Code auto mode https://www.anthropic.com/engineering/claude-code-auto-mode
Anthropic Managed Agents https://www.anthropic.com/engineering/managed-agents
Anthropic financial agents https://www.anthropic.com/news/finance-agents
Toss harness article https://toss.tech/article/harness-for-team-productivity
gstack README https://github.com/garrytan/gstack
revfactory/harness README https://github.com/revfactory/harness

Key takeaways

The useful question across cases is not "who is right?" but which distinct problem each example solves.
OpenAI tackles search cost and doc entropy (knowledge architecture); Anthropic tackles self-evaluation bias and runtime coupling (control-loop engineering).
Toss solves team distribution and reproducibility; gstack solves parallel work without chaos; revfactory solves repeating harness design itself.
The comparison table aligns each case along core input, externalized state, verification interface, and approval/rollout.
Shared conclusion: a better work environment beats a better single prompt, and long work needs external state, evaluation loops, and operations.

When studying harness engineering, the question is not "who is right?" The useful question is: which problem is each example solving?

OpenAI

Repo-readable systems, observability, runtime surface, and cleanup.

Anthropic

Planner/evaluator, managed runtime, permission classifier, and handoff.

Toss

Executable SSOT, domain layers, and frictionless team rollout.

gstack

Sprint, command surface, QA, and release gate.

revfactory/harness

Domain-first harness generation and team architecture.

Comparison Table

Case	Core input	Externalized state	Verification interface	Approval / rollout	Strongest message
OpenAI	`AGENTS.md`, `docs/`, MCP, skills, sandbox	Docs, code, observability, workspace manifest	Browser, logs, metrics, hooks	Cleanup, remote approval, Secure MCP, plugins	Repo + runtime surface is the harness
Anthropic	Task contract, permission policy	Durable session log, planner/builder/evaluator handoff	Evaluator, QA, permission classifier	Retry budget, handoff, managed runtime	Separate load-bearing scaffolding and runtime boundaries
Toss	Global/domain/local rules	Workflow and SSOT	Executable docs and procedures	Domain HITL	Push harnesses into executable team systems
gstack	Sprint phase, command, host adapter	Phase artifacts, checkpoint, learning	Review, test, ship, browser/device QA	Team mode, auto-update, release gate	Run it like a software factory
revfactory/harness	Domain analysis	Agent/skill files, team architecture	Validation and testing, A/B pilot	Generated harness refinement	A harness can generate a harness

Which Technical Problem Is Being Solved?

Case	Problem	Technical reading
OpenAI	Search cost and documentation entropy	Knowledge architecture
Anthropic	Long-running self-evaluation bias and runtime coupling	Control-loop and runtime-boundary engineering
Toss	Team distribution and reproducibility	Workflow distribution
gstack	Parallel work without chaos	Production pipeline design
revfactory/harness	Repeating harness design itself	Meta-architecture generation

2026-05-23 Update Points

Case	Latest addition
OpenAI	Agents SDK model-native harness, sandbox execution, TypeScript sandbox agents, Secure MCP Tunnel, Codex remote/hooks, Developers plugin
Anthropic	Claude Code auto mode prompt-injection probe and transcript classifier, Managed Agents session-harness-sandbox split, finance agent templates
gstack	23 specialists, 8 power tools, 10 AI coding agent hosts, team mode auto-update, iOS live-device QA, checkpoint and learning flows
revfactory/harness	v1.2.0 L3 Meta-Factory / Team-Architecture Factory, marketplace install, Harness 100, author-measured A/B results with caveat

Detailed Interpretation

Recommended Order

Need	Read first
Improve repo and docs	OpenAI
Design evaluation loops and retry budgets	Anthropic
Roll out team workflows	Toss
Build opinionated sprint pipelines	gstack
Generate domain-specific harnesses	revfactory/harness

Shared Conclusion

A better work environment matters more than a better single prompt.
Longer work requires external state and evaluation loops.
Team adoption requires executable workflows, commands, and approvals.
Generic templates are starting points; domain-specific harnesses create performance.
Harnesses must be operated and cleaned up.

References

OpenAI, "Harness Engineering", 2026-02-11 https://openai.com/ko-KR/index/harness-engineering/
OpenAI, "The next evolution of the Agents SDK", 2026-04-15 https://openai.com/index/the-next-evolution-of-the-agents-sdk/
OpenAI, "Work with Codex from anywhere", 2026-05-14 https://openai.com/index/work-with-codex-from-anywhere/
OpenAI API Changelog https://developers.openai.com/api/docs/changelog
OpenAI Developers plugin for Codex https://developers.openai.com/learn/developers-codex-plugin
Anthropic harness design https://www.anthropic.com/engineering/harness-design-long-running-apps
Anthropic Claude Code auto mode https://www.anthropic.com/engineering/claude-code-auto-mode
Anthropic Managed Agents https://www.anthropic.com/engineering/managed-agents
Anthropic financial agents https://www.anthropic.com/news/finance-agents
Toss harness article https://toss.tech/article/harness-for-team-productivity
gstack README https://github.com/garrytan/gstack
revfactory/harness README https://github.com/revfactory/harness

OpenAI

Anthropic

Toss

gstack

revfactory/harness

Comparison Table

Which Technical Problem Is Being Solved?

2026-05-23 Update Points

Detailed Interpretation

Recommended Order

Shared Conclusion

References

On This Page

External Case Comparison

OpenAI

Anthropic

Toss

gstack

revfactory/harness

Comparison Table

Which Technical Problem Is Being Solved?

2026-05-23 Update Points

Detailed Interpretation

Recommended Order

Shared Conclusion

References

On This Page

External Case Comparison

OpenAI

Anthropic

Toss

gstack

revfactory/harness

OpenAI: the repo itself is the harness

Anthropic: keep only load-bearing loops

Toss: push harnesses into the team system

gstack: an opinionated software factory

revfactory/harness: generate the harness

On This Page

External Case Comparison

OpenAI

Anthropic

Toss

gstack

revfactory/harness

OpenAI: the repo itself is the harness

Anthropic: keep only load-bearing loops

Toss: push harnesses into the team system

gstack: an opinionated software factory

revfactory/harness: generate the harness

On This Page