Team Harness Rollout Strategy
Use Toss, gstack, revfactory, OpenAI, and Anthropic patterns to scale personal routines into a team execution system.
Key takeaways
- Rolling out a harness means distributing a way of working as a system, not handing out documents; the unit of rollout is workflow (commands, skills, templates, hooks, sandbox/MCP, plugins, remote approval, docs).
- Team size dictates scope: small teams start with
AGENTS.mdplus a few core commands and an updates log; growing teams add global/domain/local layers and runtime boundary policy; platform orgs add a harness registry and telemetry. - Each external case contributes a distinct lesson: Toss raises the productivity floor with layered SSOT, OpenAI standardizes runtime primitives, Anthropic guards trust boundaries, gstack supplies opinionated role commands, revfactory generates harnesses from domain analysis.
- The 30/60/90 plan moves from externalizing two or three repeated failures, to splitting domain layers and wiring release gates, to adding telemetry, updates, and garbage collection cadence.
- It is working when new members hit a similar baseline, review comments shift from repeated mistakes to better decisions, and model changes shake the team less.
Using a good harness alone is different from making the whole team produce similar baseline quality.
Toss frames this as raising the team's productivity floor. gstack and revfactory approach it through opinionated workflow and generated harnesses. OpenAI and Anthropic add runtime primitives: sandbox, MCP, hooks, remote approvals, plugins, sessions, and permission classifiers.
The Unit of Rollout Is Workflow
| Rollout unit | Role |
|---|---|
| Command | Encapsulates repeated work sequences |
| Skill | Packages role-specific knowledge |
| Template | Standardizes plans, runbooks, updates, release notes |
| Hook / Script | Automates validation and blocking |
| Sandbox / MCP | Controls execution and internal tool access |
| Plugin | Installs provider setup, domain workflow, API key setup, troubleshooting |
| Remote approval | Keeps human judgment reachable during long work |
| Doc | Explains why and records the baseline date |
Team Expansion Model
What Different Team Sizes Need
Start with:
- short
AGENTS.md; - two or three core commands such as review and browser QA;
- updates log.
Start with:
- global / domain / local layers;
- shared release gate;
- domain skills and templates;
- runtime boundary policy.
Start with:
- harness registry;
- domain harnesses for product groups;
- telemetry, stale detection, lifecycle management.
Rollout Lessons
From Toss
- Separate global, domain, and local layers.
- Make workflow and plugins act as executable SSOT.
- Move personal expert habits into team workflow.
- If the harness is not frictionless, adoption collapses.
From OpenAI
- Treat MCP, skills,
AGENTS.md, shell, andapply_patchas standard primitives. - Use sandbox and Manifest to make inputs, outputs, dependencies, and side effects predictable.
- Use hooks for prompt checks, validation, logging, and memory.
- Design remote approval for long-running work.
- Prefer private MCP connection paths over exposing internal servers.
- Use plugin surfaces for repeated provider setup and API troubleshooting.
From Anthropic
- Do not bind session, harness, and sandbox into one failure boundary.
- Treat auto approval as trust boundary, block rule, and allow exception policy.
- Check subagent handoff at delegation and return.
- Keep credentials behind vaults, scoped resources, or MCP proxies.
- Package domain workflow with skills, connectors, subagents, audit logs, and approval flows.
From gstack
- Provide opinionated role commands.
- Connect review, test, ship, and reflect.
- Keep browser QA and release docs as separate steps.
- Manage install paths and auto-update policies per agent host.
From revfactory/harness
- Analyze the domain.
- Choose an architecture pattern.
- Generate agent teams and skills.
- Tune through validation.
30 / 60 / 90 Day Rollout
30 days: choose two or three repeated failures and externalize them into commands and checklists.
60 days: split domain layers and connect review, browser QA, and release gates.
90 days: add telemetry, updates, and garbage collection cadence.
Minimum Team Package
| Component | Minimum contents |
|---|---|
| Entry docs | AGENTS.md, reading path, required verification |
| Domain docs | Architecture, invariants, release gates |
| Workflow | Review, QA, ship, updates |
| Runtime boundary | Sandbox permissions, MCP allowlist, hooks, approval policy, classifier/trust-boundary config |
| Provider/domain package | Plugin, skill, connector, cookbook rules |
| Operating log | Updates and stale cleanup |
Signs It Is Working
- New team members finish first tasks at a similar baseline quality.
- Review comments shift from repeated mistakes to better decisions.
- Previously personal routines become commands and skills.
- Model changes shake the team less.
Conclusion
Rolling out a harness is not distributing documents. It is distributing a better way of working as a system.