Operations: Entropy and Garbage Collection

Explain how harnesses decay and how teams keep docs, workflows, permissions, hooks, and runtime surfaces current.

Key takeaways

A harness decays after launch through documentation drift, workflow drift, criteria aging, returning tacit knowledge, runtime-surface drift, and auto-approval policy drift.
Cleanup runs on a cadence: weekly broken links, biweekly stale docs, monthly approval policy, quarterly MCP/hooks/sandbox, and a model-upgrade review.
Watch metrics like outcome variance, late QA defects, missing approvals, MCP failure rate, classifier denial rate, and auto-to-human fallback rate.
The garbage-collection backlog maps targets like huge rule files, unused commands, broad trust boundaries, and provider drift to specific treatments.
Ownership must be explicit per area, and healthy signs include docs and workflow changing together and model gains triggering measured simplification.

A harness is not finished when it launches. It decays over time.

Why Harnesses Break

1. Documentation Drift

The code changes, but docs remain.

2. Workflow Drift

Commands, hooks, and checklists stop matching release flow or model behavior.

3. Criteria Aging

Review steps that were necessary for an older model may become overhead.

4. Tacit Knowledge Returns

Under pressure, teams solve problems in chat and meetings instead of updating the harness.

5. Runtime Surface Drift

MCP servers, hooks, sandbox permissions, and remote approval policy can drift faster than docs.

6. Auto Approval Policy Drift

Trust boundaries can widen, deny-rule exceptions can accumulate, and credential-vault assumptions can fall behind infrastructure changes.

Operating Cadence

Cadence	Work
Weekly	Broken links, failed commands, repeated QA issues
Biweekly	Stale docs, unused skills, duplicate rules
Monthly	Approval policy, test gates, domain rules
Quarterly	MCP allowlist, hooks, sandbox permissions, auto approval policy, remote approval logs
Model upgrade	Remove unnecessary scaffolding, identify new failure modes

Metrics to Watch

Outcome variance for similar requests.
Repeated review comments.
QA defects found late.
Stale doc count.
Missing or incorrect approvals.
Changes merged without browser or log verification.
MCP tool failure or misuse rate.
Hook validation bypass rate.
Classifier denial rate and safe-alternative success rate.
Auto approval to human fallback rate.
Remote approval wait time and rework reduction.

Garbage Collection Backlog

Target	Risk	Treatment
Huge rule file	Not read, likely stale	Split into TOC plus docs
Unused slash command	Confuses the team	Delete or merge
Unverified checklist	False confidence	Automate or remove
Old-model helper role	Slows work without quality gain	Experiment, then delete or shrink
Unused MCP server	Extra permission and attack surface	Remove from allowlist
Old hook	Blocks or allows incorrectly	Update or disable
Broad trust boundary	Auto approval exceeds intent	Redefine deny rules and exceptions
Provider/plugin drift	API key or connector path changes	Update owner and version

Ownership

Responsibility	Owner
Domain rules	Team lead or domain owner
Docs, links, rule cleanup	Docs owner or rotating owner
Evaluation criteria	Reviewer, QA, or platform role
MCP / sandbox / hooks / auto approval	Platform or security owner
Provider plugins and domain templates	AI platform or domain owner
Model upgrade review	AI platform owner or adoption lead

Healthy Signs

Changes include updates-log entries.
Docs and workflow change together.
Model improvements trigger measured simplification, not blind removal.
Personal tricks become commands, skills, or checklists when repeated.

Conclusion

The final stage of harness engineering is operations. A durable harness assumes entropy and makes cleanup part of the system.

Key takeaways

A harness decays after launch through documentation drift, workflow drift, criteria aging, returning tacit knowledge, runtime-surface drift, and auto-approval policy drift.
Cleanup runs on a cadence: weekly broken links, biweekly stale docs, monthly approval policy, quarterly MCP/hooks/sandbox, and a model-upgrade review.
Watch metrics like outcome variance, late QA defects, missing approvals, MCP failure rate, classifier denial rate, and auto-to-human fallback rate.
The garbage-collection backlog maps targets like huge rule files, unused commands, broad trust boundaries, and provider drift to specific treatments.
Ownership must be explicit per area, and healthy signs include docs and workflow changing together and model gains triggering measured simplification.

A harness is not finished when it launches. It decays over time.

Why Harnesses Break

1. Documentation Drift

The code changes, but docs remain.

2. Workflow Drift

Commands, hooks, and checklists stop matching release flow or model behavior.

3. Criteria Aging

Review steps that were necessary for an older model may become overhead.

4. Tacit Knowledge Returns

Under pressure, teams solve problems in chat and meetings instead of updating the harness.

5. Runtime Surface Drift

MCP servers, hooks, sandbox permissions, and remote approval policy can drift faster than docs.

6. Auto Approval Policy Drift

Trust boundaries can widen, deny-rule exceptions can accumulate, and credential-vault assumptions can fall behind infrastructure changes.

Operating Cadence

Cadence	Work
Weekly	Broken links, failed commands, repeated QA issues
Biweekly	Stale docs, unused skills, duplicate rules
Monthly	Approval policy, test gates, domain rules
Quarterly	MCP allowlist, hooks, sandbox permissions, auto approval policy, remote approval logs
Model upgrade	Remove unnecessary scaffolding, identify new failure modes

Metrics to Watch

Outcome variance for similar requests.
Repeated review comments.
QA defects found late.
Stale doc count.
Missing or incorrect approvals.
Changes merged without browser or log verification.
MCP tool failure or misuse rate.
Hook validation bypass rate.
Classifier denial rate and safe-alternative success rate.
Auto approval to human fallback rate.
Remote approval wait time and rework reduction.

Garbage Collection Backlog

Target	Risk	Treatment
Huge rule file	Not read, likely stale	Split into TOC plus docs
Unused slash command	Confuses the team	Delete or merge
Unverified checklist	False confidence	Automate or remove
Old-model helper role	Slows work without quality gain	Experiment, then delete or shrink
Unused MCP server	Extra permission and attack surface	Remove from allowlist
Old hook	Blocks or allows incorrectly	Update or disable
Broad trust boundary	Auto approval exceeds intent	Redefine deny rules and exceptions
Provider/plugin drift	API key or connector path changes	Update owner and version

Ownership

Responsibility	Owner
Domain rules	Team lead or domain owner
Docs, links, rule cleanup	Docs owner or rotating owner
Evaluation criteria	Reviewer, QA, or platform role
MCP / sandbox / hooks / auto approval	Platform or security owner
Provider plugins and domain templates	AI platform or domain owner
Model upgrade review	AI platform owner or adoption lead

Healthy Signs

Changes include updates-log entries.
Docs and workflow change together.
Model improvements trigger measured simplification, not blind removal.
Personal tricks become commands, skills, or checklists when repeated.

Conclusion

The final stage of harness engineering is operations. A durable harness assumes entropy and makes cleanup part of the system.

Operations: Entropy and Garbage Collection

Why Harnesses Break

1. Documentation Drift

2. Workflow Drift

3. Criteria Aging

4. Tacit Knowledge Returns

5. Runtime Surface Drift

6. Auto Approval Policy Drift

Operating Cadence

Metrics to Watch

Garbage Collection Backlog

Ownership

Healthy Signs

Conclusion

On This Page

Operations: Entropy and Garbage Collection

Why Harnesses Break

1. Documentation Drift

2. Workflow Drift

3. Criteria Aging

4. Tacit Knowledge Returns

5. Runtime Surface Drift

6. Auto Approval Policy Drift

Operating Cadence

Metrics to Watch

Garbage Collection Backlog

Ownership

Healthy Signs

Conclusion

On This Page