Operations: Entropy and Garbage Collection
Explain how harnesses decay and how teams keep docs, workflows, permissions, hooks, and runtime surfaces current.
Key takeaways
- A harness decays after launch through documentation drift, workflow drift, criteria aging, returning tacit knowledge, runtime-surface drift, and auto-approval policy drift.
- Cleanup runs on a cadence: weekly broken links, biweekly stale docs, monthly approval policy, quarterly MCP/hooks/sandbox, and a model-upgrade review.
- Watch metrics like outcome variance, late QA defects, missing approvals, MCP failure rate, classifier denial rate, and auto-to-human fallback rate.
- The garbage-collection backlog maps targets like huge rule files, unused commands, broad trust boundaries, and provider drift to specific treatments.
- Ownership must be explicit per area, and healthy signs include docs and workflow changing together and model gains triggering measured simplification.
A harness is not finished when it launches. It decays over time.
Why Harnesses Break
1. Documentation Drift
The code changes, but docs remain.
2. Workflow Drift
Commands, hooks, and checklists stop matching release flow or model behavior.
3. Criteria Aging
Review steps that were necessary for an older model may become overhead.
4. Tacit Knowledge Returns
Under pressure, teams solve problems in chat and meetings instead of updating the harness.
5. Runtime Surface Drift
MCP servers, hooks, sandbox permissions, and remote approval policy can drift faster than docs.
6. Auto Approval Policy Drift
Trust boundaries can widen, deny-rule exceptions can accumulate, and credential-vault assumptions can fall behind infrastructure changes.
Operating Cadence
| Cadence | Work |
|---|---|
| Weekly | Broken links, failed commands, repeated QA issues |
| Biweekly | Stale docs, unused skills, duplicate rules |
| Monthly | Approval policy, test gates, domain rules |
| Quarterly | MCP allowlist, hooks, sandbox permissions, auto approval policy, remote approval logs |
| Model upgrade | Remove unnecessary scaffolding, identify new failure modes |
Metrics to Watch
- Outcome variance for similar requests.
- Repeated review comments.
- QA defects found late.
- Stale doc count.
- Missing or incorrect approvals.
- Changes merged without browser or log verification.
- MCP tool failure or misuse rate.
- Hook validation bypass rate.
- Classifier denial rate and safe-alternative success rate.
- Auto approval to human fallback rate.
- Remote approval wait time and rework reduction.
Garbage Collection Backlog
| Target | Risk | Treatment |
|---|---|---|
| Huge rule file | Not read, likely stale | Split into TOC plus docs |
| Unused slash command | Confuses the team | Delete or merge |
| Unverified checklist | False confidence | Automate or remove |
| Old-model helper role | Slows work without quality gain | Experiment, then delete or shrink |
| Unused MCP server | Extra permission and attack surface | Remove from allowlist |
| Old hook | Blocks or allows incorrectly | Update or disable |
| Broad trust boundary | Auto approval exceeds intent | Redefine deny rules and exceptions |
| Provider/plugin drift | API key or connector path changes | Update owner and version |
Ownership
| Responsibility | Owner |
|---|---|
| Domain rules | Team lead or domain owner |
| Docs, links, rule cleanup | Docs owner or rotating owner |
| Evaluation criteria | Reviewer, QA, or platform role |
| MCP / sandbox / hooks / auto approval | Platform or security owner |
| Provider plugins and domain templates | AI platform or domain owner |
| Model upgrade review | AI platform owner or adoption lead |
Healthy Signs
- Changes include updates-log entries.
- Docs and workflow change together.
- Model improvements trigger measured simplification, not blind removal.
- Personal tricks become commands, skills, or checklists when repeated.
Conclusion
The final stage of harness engineering is operations. A durable harness assumes entropy and makes cleanup part of the system.