Team Harness Rollout Strategy

Use Toss, gstack, revfactory, OpenAI, and Anthropic patterns to scale personal routines into a team execution system.

Key takeaways

Rolling out a harness means distributing a way of working as a system, not handing out documents; the unit of rollout is workflow (commands, skills, templates, hooks, sandbox/MCP, plugins, remote approval, docs).
Team size dictates scope: small teams start with AGENTS.md plus a few core commands and an updates log; growing teams add global/domain/local layers and runtime boundary policy; platform orgs add a harness registry and telemetry.
Each external case contributes a distinct lesson: Toss raises the productivity floor with layered SSOT, OpenAI standardizes runtime primitives, Anthropic guards trust boundaries, gstack supplies opinionated role commands, revfactory generates harnesses from domain analysis.
The 30/60/90 plan moves from externalizing two or three repeated failures, to splitting domain layers and wiring release gates, to adding telemetry, updates, and garbage collection cadence.
It is working when new members hit a similar baseline, review comments shift from repeated mistakes to better decisions, and model changes shake the team less.

Using a good harness alone is different from making the whole team produce similar baseline quality.

Toss frames this as raising the team's productivity floor. gstack and revfactory approach it through opinionated workflow and generated harnesses. OpenAI and Anthropic add runtime primitives: sandbox, MCP, hooks, remote approvals, plugins, sessions, and permission classifiers.

The Unit of Rollout Is Workflow

Rollout unit	Role
Command	Encapsulates repeated work sequences
Skill	Packages role-specific knowledge
Template	Standardizes plans, runbooks, updates, release notes
Hook / Script	Automates validation and blocking
Sandbox / MCP	Controls execution and internal tool access
Plugin	Installs provider setup, domain workflow, API key setup, troubleshooting
Remote approval	Keeps human judgment reachable during long work
Doc	Explains why and records the baseline date

Team Expansion Model

What Different Team Sizes Need

Start with:

short AGENTS.md;
two or three core commands such as review and browser QA;
updates log.

Start with:

global / domain / local layers;
shared release gate;
domain skills and templates;
runtime boundary policy.

Start with:

harness registry;
domain harnesses for product groups;
telemetry, stale detection, lifecycle management.

Rollout Lessons

From Toss

Separate global, domain, and local layers.
Make workflow and plugins act as executable SSOT.
Move personal expert habits into team workflow.
If the harness is not frictionless, adoption collapses.

From OpenAI

Treat MCP, skills, AGENTS.md, shell, and apply_patch as standard primitives.
Use sandbox and Manifest to make inputs, outputs, dependencies, and side effects predictable.
Use hooks for prompt checks, validation, logging, and memory.
Design remote approval for long-running work.
Prefer private MCP connection paths over exposing internal servers.
Use plugin surfaces for repeated provider setup and API troubleshooting.

From Anthropic

Do not bind session, harness, and sandbox into one failure boundary.
Treat auto approval as trust boundary, block rule, and allow exception policy.
Check subagent handoff at delegation and return.
Keep credentials behind vaults, scoped resources, or MCP proxies.
Package domain workflow with skills, connectors, subagents, audit logs, and approval flows.

From gstack

Provide opinionated role commands.
Connect review, test, ship, and reflect.
Keep browser QA and release docs as separate steps.
Manage install paths and auto-update policies per agent host.

From revfactory/harness

Analyze the domain.
Choose an architecture pattern.
Generate agent teams and skills.
Tune through validation.

30 / 60 / 90 Day Rollout

30 days: choose two or three repeated failures and externalize them into commands and checklists.

60 days: split domain layers and connect review, browser QA, and release gates.

90 days: add telemetry, updates, and garbage collection cadence.

Minimum Team Package

Component	Minimum contents
Entry docs	`AGENTS.md`, reading path, required verification
Domain docs	Architecture, invariants, release gates
Workflow	Review, QA, ship, updates
Runtime boundary	Sandbox permissions, MCP allowlist, hooks, approval policy, classifier/trust-boundary config
Provider/domain package	Plugin, skill, connector, cookbook rules
Operating log	Updates and stale cleanup

Signs It Is Working

New team members finish first tasks at a similar baseline quality.
Review comments shift from repeated mistakes to better decisions.
Previously personal routines become commands and skills.
Model changes shake the team less.

Conclusion

Rolling out a harness is not distributing documents. It is distributing a better way of working as a system.

Team Harness Rollout Strategy

Use Toss, gstack, revfactory, OpenAI, and Anthropic patterns to scale personal routines into a team execution system.

Key takeaways

Rolling out a harness means distributing a way of working as a system, not handing out documents; the unit of rollout is workflow (commands, skills, templates, hooks, sandbox/MCP, plugins, remote approval, docs).
Team size dictates scope: small teams start with AGENTS.md plus a few core commands and an updates log; growing teams add global/domain/local layers and runtime boundary policy; platform orgs add a harness registry and telemetry.
Each external case contributes a distinct lesson: Toss raises the productivity floor with layered SSOT, OpenAI standardizes runtime primitives, Anthropic guards trust boundaries, gstack supplies opinionated role commands, revfactory generates harnesses from domain analysis.
The 30/60/90 plan moves from externalizing two or three repeated failures, to splitting domain layers and wiring release gates, to adding telemetry, updates, and garbage collection cadence.
It is working when new members hit a similar baseline, review comments shift from repeated mistakes to better decisions, and model changes shake the team less.

Using a good harness alone is different from making the whole team produce similar baseline quality.

The Unit of Rollout Is Workflow

Rollout unit	Role
Command	Encapsulates repeated work sequences
Skill	Packages role-specific knowledge
Template	Standardizes plans, runbooks, updates, release notes
Hook / Script	Automates validation and blocking
Sandbox / MCP	Controls execution and internal tool access
Plugin	Installs provider setup, domain workflow, API key setup, troubleshooting
Remote approval	Keeps human judgment reachable during long work
Doc	Explains why and records the baseline date

Team Expansion Model

What Different Team Sizes Need

Start with:

short AGENTS.md;
two or three core commands such as review and browser QA;
updates log.

Start with:

global / domain / local layers;
shared release gate;
domain skills and templates;
runtime boundary policy.

Start with:

harness registry;
domain harnesses for product groups;
telemetry, stale detection, lifecycle management.

Rollout Lessons

From Toss

Separate global, domain, and local layers.
Make workflow and plugins act as executable SSOT.
Move personal expert habits into team workflow.
If the harness is not frictionless, adoption collapses.

From OpenAI

Treat MCP, skills, AGENTS.md, shell, and apply_patch as standard primitives.
Use sandbox and Manifest to make inputs, outputs, dependencies, and side effects predictable.
Use hooks for prompt checks, validation, logging, and memory.
Design remote approval for long-running work.
Prefer private MCP connection paths over exposing internal servers.
Use plugin surfaces for repeated provider setup and API troubleshooting.

From Anthropic

Do not bind session, harness, and sandbox into one failure boundary.
Treat auto approval as trust boundary, block rule, and allow exception policy.
Check subagent handoff at delegation and return.
Keep credentials behind vaults, scoped resources, or MCP proxies.
Package domain workflow with skills, connectors, subagents, audit logs, and approval flows.

From gstack

Provide opinionated role commands.
Connect review, test, ship, and reflect.
Keep browser QA and release docs as separate steps.
Manage install paths and auto-update policies per agent host.

From revfactory/harness

Analyze the domain.
Choose an architecture pattern.
Generate agent teams and skills.
Tune through validation.

30 / 60 / 90 Day Rollout

30 days: choose two or three repeated failures and externalize them into commands and checklists.

60 days: split domain layers and connect review, browser QA, and release gates.

90 days: add telemetry, updates, and garbage collection cadence.

Minimum Team Package

Component	Minimum contents
Entry docs	`AGENTS.md`, reading path, required verification
Domain docs	Architecture, invariants, release gates
Workflow	Review, QA, ship, updates
Runtime boundary	Sandbox permissions, MCP allowlist, hooks, approval policy, classifier/trust-boundary config
Provider/domain package	Plugin, skill, connector, cookbook rules
Operating log	Updates and stale cleanup

Signs It Is Working

New team members finish first tasks at a similar baseline quality.
Review comments shift from repeated mistakes to better decisions.
Previously personal routines become commands and skills.
Model changes shake the team less.

Conclusion

Rolling out a harness is not distributing documents. It is distributing a better way of working as a system.

The Unit of Rollout Is Workflow

Team Expansion Model

What Different Team Sizes Need

Rollout Lessons

From Toss

From OpenAI

From Anthropic

From gstack

From revfactory/harness

30 / 60 / 90 Day Rollout

Minimum Team Package

Signs It Is Working

Conclusion

On This Page

Team Harness Rollout Strategy

The Unit of Rollout Is Workflow

Team Expansion Model

What Different Team Sizes Need

Rollout Lessons

From Toss

From OpenAI

From Anthropic

From gstack

From revfactory/harness

30 / 60 / 90 Day Rollout

Minimum Team Package

Signs It Is Working

Conclusion

On This Page