Ch2. Versioning and Release

Key takeaways

The release unit for an AI service bundles prompts, models, tools, and policy packs into one traceable artifact, each with its own version rule.
A release artifact must record not only what changed but which permissions, eval set, and observability schema validated it.
Gate releases on both quality and security thresholds (quality delta ≥ -1%, cost ≤ +5%, PII exposure == 0, injection-test pass == 100%) before canary rollout at 5-10%.
Run a security checklist covering policy tests, PII scans, injection tests, permission diffs, and approval-resumption tests.
Roll back immediately on PII exposure, policy bypass over 0.5%, privilege escalation, injection success, or pre-auth A2A resource leakage, even if quality improved.

The release unit for an AI service is not just code.
Prompts, models, tools, and policy packs must be managed as one release artifact if you want to control regressions.

Version Components

Component	Recommended Version Rule
Prompt	`p-YYYYMMDD.N`
Model	Provider model ID plus internal compatibility level
Tool Schema	SemVer (`major.minor.patch`)
Policy Pack	Hash version with approval history
MCP Server	Allowlist ID plus scope plus server version
Eval Set	Dataset hash plus grader version
Trace Schema	OTel/AOS field version

The release artifact must show not only what changed, but also which permissions, evaluation set, and observability schema were used to validate it.

Release Pipeline

Pass offline evaluation criteria: quality, safety, and cost.

Pass security checks: policy tests, PII scan, and prompt-injection tests.

Start canary traffic at 5-10%.

Monitor burn rate for errors, latency, cost, and security violations.

Expand gradually if healthy: 25% to 50% to 100%.

Release Gate Example

release_gate:
  # Quality and performance
  quality_delta: '>= -1.0%'
  cost_delta: '<= +5%'
  p95_latency_delta: '<= +10%'

  # Security
  safety_violation_rate: '<= 0.2%'
  pii_exposure_count: '== 0'
  prompt_injection_test_pass_rate: '== 100%'
  privilege_escalation_attempts: '== 0'
  security_policy_test_coverage: '>= 95%'

  # Agent/tool controls
  unapproved_tool_scope_changes: '== 0'
  mcp_server_allowlist_diff: 'reviewed'
  trace_schema_compatible: true
  approval_resume_test_passed: true

Release Artifact Manifest Example

release_artifact:
  app_version: ai-support-2026.05.17-1
  prompt_version: p-20260517.2
  model_policy: routing-20260517
  tool_schema_version: tools-1.8.0
  mcp_allowlist_hash: sha256:8d4f...
  skill_manifest_hash: sha256:52ac...
  eval_dataset_hash: sha256:93ab...
  grader_version: judge-20260517.1
  trace_schema: genai-otel-1.41.0+aos-adapter-0.3
  approvals:
    owner: platform-ai
    security: approved
    compliance: approved

Security Verification Checklist

Policy tests: system-instruction bypass, tool access control, output filtering.
PII scan: sensitive data in prompt templates, RAG documents, and system messages.
Injection tests: direct prompt bypass, RAG poisoning, and tool-result manipulation.
Permission diff review: added or expanded scopes for MCP servers, skills, and function tools.
Approval resumption test: a run paused for human review resumes from the same state after approval or rejection.
Trace compatibility: new trace fields remain compatible with dashboards, eval graders, and incident runbooks.

Security Rollback Conditions

Roll back immediately when PII exposure is detected.
Roll back when policy bypass exceeds 0.5%.
Roll back when privilege escalation attempts are detected.
Roll back when a prompt-injection success case is found.
Roll back when a new MCP server or skill performs unauthorized network, file, or system access.
Roll back when an A2A peer leaks internal resource existence before authentication.

Security First

Do not proceed with release if any security criterion fails, even when performance or quality improves. Security is a non-negotiable gate.

Practice Principle

Model upgrades often carry more regression risk than expected. Prefer parallel operation and staged rollout over an immediate full switch.

Baseline and Sources

Item	Baseline Date	Recheck By	Primary Source
Human review/resumable state	2026-05-17	2026-06-16	https://developers.openai.com/api/docs/guides/agents/guardrails-approvals
OTel GenAI trace schema	2026-05-17	2026-06-16	https://opentelemetry.io/docs/specs/semconv/gen-ai/
MCP/Skill scope control	2026-05-17	2026-06-16	https://owasp.org/www-project-mcp-top-10/

Key takeaways

The release unit for an AI service bundles prompts, models, tools, and policy packs into one traceable artifact, each with its own version rule.
A release artifact must record not only what changed but which permissions, eval set, and observability schema validated it.
Gate releases on both quality and security thresholds (quality delta ≥ -1%, cost ≤ +5%, PII exposure == 0, injection-test pass == 100%) before canary rollout at 5-10%.
Run a security checklist covering policy tests, PII scans, injection tests, permission diffs, and approval-resumption tests.
Roll back immediately on PII exposure, policy bypass over 0.5%, privilege escalation, injection success, or pre-auth A2A resource leakage, even if quality improved.

The release unit for an AI service is not just code.
Prompts, models, tools, and policy packs must be managed as one release artifact if you want to control regressions.

Version Components

Component	Recommended Version Rule
Prompt	`p-YYYYMMDD.N`
Model	Provider model ID plus internal compatibility level
Tool Schema	SemVer (`major.minor.patch`)
Policy Pack	Hash version with approval history
MCP Server	Allowlist ID plus scope plus server version
Eval Set	Dataset hash plus grader version
Trace Schema	OTel/AOS field version

The release artifact must show not only what changed, but also which permissions, evaluation set, and observability schema were used to validate it.

Release Pipeline

Pass offline evaluation criteria: quality, safety, and cost.

Pass security checks: policy tests, PII scan, and prompt-injection tests.

Start canary traffic at 5-10%.

Monitor burn rate for errors, latency, cost, and security violations.

Expand gradually if healthy: 25% to 50% to 100%.

Release Gate Example

release_gate:
  # Quality and performance
  quality_delta: '>= -1.0%'
  cost_delta: '<= +5%'
  p95_latency_delta: '<= +10%'

  # Security
  safety_violation_rate: '<= 0.2%'
  pii_exposure_count: '== 0'
  prompt_injection_test_pass_rate: '== 100%'
  privilege_escalation_attempts: '== 0'
  security_policy_test_coverage: '>= 95%'

  # Agent/tool controls
  unapproved_tool_scope_changes: '== 0'
  mcp_server_allowlist_diff: 'reviewed'
  trace_schema_compatible: true
  approval_resume_test_passed: true

Release Artifact Manifest Example

release_artifact:
  app_version: ai-support-2026.05.17-1
  prompt_version: p-20260517.2
  model_policy: routing-20260517
  tool_schema_version: tools-1.8.0
  mcp_allowlist_hash: sha256:8d4f...
  skill_manifest_hash: sha256:52ac...
  eval_dataset_hash: sha256:93ab...
  grader_version: judge-20260517.1
  trace_schema: genai-otel-1.41.0+aos-adapter-0.3
  approvals:
    owner: platform-ai
    security: approved
    compliance: approved

Security Verification Checklist

Policy tests: system-instruction bypass, tool access control, output filtering.
PII scan: sensitive data in prompt templates, RAG documents, and system messages.
Injection tests: direct prompt bypass, RAG poisoning, and tool-result manipulation.
Permission diff review: added or expanded scopes for MCP servers, skills, and function tools.
Approval resumption test: a run paused for human review resumes from the same state after approval or rejection.
Trace compatibility: new trace fields remain compatible with dashboards, eval graders, and incident runbooks.

Security Rollback Conditions

Roll back immediately when PII exposure is detected.
Roll back when policy bypass exceeds 0.5%.
Roll back when privilege escalation attempts are detected.
Roll back when a prompt-injection success case is found.
Roll back when a new MCP server or skill performs unauthorized network, file, or system access.
Roll back when an A2A peer leaks internal resource existence before authentication.

Security First

Do not proceed with release if any security criterion fails, even when performance or quality improves. Security is a non-negotiable gate.

Practice Principle

Model upgrades often carry more regression risk than expected. Prefer parallel operation and staged rollout over an immediate full switch.

Baseline and Sources

Item	Baseline Date	Recheck By	Primary Source
Human review/resumable state	2026-05-17	2026-06-16	https://developers.openai.com/api/docs/guides/agents/guardrails-approvals
OTel GenAI trace schema	2026-05-17	2026-06-16	https://opentelemetry.io/docs/specs/semconv/gen-ai/
MCP/Skill scope control	2026-05-17	2026-06-16	https://owasp.org/www-project-mcp-top-10/

Version Components

Release Pipeline

Release Gate Example

Release Artifact Manifest Example

Security Verification Checklist

Security Rollback Conditions

Baseline and Sources

On This Page

Ch2. Versioning and Release

Version Components

Release Pipeline

Release Gate Example

Release Artifact Manifest Example

Security Verification Checklist

Security Rollback Conditions

Baseline and Sources

On This Page