Name: LLMOps and AgentOps in Production
Author: reopt

A production operating system for turning experimental AI features into reliable services

Making an AI feature work is not the same as making it operable.
In production, model quality must be managed alongside release control, SLOs, cost stability, and incident response.

This handbook treats LLMOps and AgentOps as one operating system rather than separate disciplines.

Core Goal

Build an operating foundation where quality, cost, and security remain stable even as teams repeatedly change models, prompts, tools, and agent workflows.

English Edition

This English edition was selected because AgentOps, MCP/A2A, trace-first evaluation, and AI cost governance are high-interest topics for international platform, SRE, and AI infrastructure teams.

May 2026 Update

A2A latest v1.0.0 and MCP 2025-11-25 security requirements: OAuth 2.1, audience binding, and token passthrough prohibition (Ch1)
Trace-first evaluation, agent workflow trace grading, and production trace to dataset/eval loops (Ch3, Ch5)
Human review, resumable approval state, hosted/private MCP trust boundaries, and Agentic Skills supply-chain security (Ch4)
OpenTelemetry GenAI Development status and OWASP AOS work-in-progress status clarified (Ch5)
GPT-5.5/GPT-5.4/GPT-5.4 mini, Claude 4.7/4.6/4.5, and DeepSeek V4 pricing baseline refreshed (Ch6)
Incident handling expanded for MCP/skill compromise, A2A webhook abuse, and automated recovery approval boundaries (Ch8)

Core Operating Formulas

\text{Unit Cost per Task} = \sum_i(\text{Token}_i \times \text{Price}_i) + \text{Tool Cost} + \text{Infra Cost}

\text{Error Budget Burn Rate} = \frac{\text{Current Error Rate}}{\text{Allowed Error Rate}}

Operating Maturity Model

Level	State	Characteristics	Promotion Criteria
L1 Prototype	Demo-driven	Manual prompts and ad hoc operations	Standardized logs
L2 Controlled	Basic operations	Versioning and release control introduced	Offline evaluation system
L3 Reliable	Reliable operations	SLOs, guardrails, and fallback automation	Joint cost/quality optimization
L4 Adaptive	Supervised adaptation	Drift detection, policy tuning, automated recovery	Change evidence and approval logs retained

Go-Live Gates

Gate	Example Pass Criteria
Quality gate	Core task success rate >= 95%
Safety gate	Policy violation rate <= 0.2%
Performance gate	p95 latency within budget
Cost gate	Unit cost within budget +5%

A production operating system for turning experimental AI features into reliable services

Core Operating Formulas

\text{Unit Cost per Task} = \sum_i(\text{Token}_i \times \text{Price}_i) + \text{Tool Cost} + \text{Infra Cost}

\text{Error Budget Burn Rate} = \frac{\text{Current Error Rate}}{\text{Allowed Error Rate}}

Operating Maturity Model

Level	State	Characteristics	Promotion Criteria
L1 Prototype	Demo-driven	Manual prompts and ad hoc operations	Standardized logs
L2 Controlled	Basic operations	Versioning and release control introduced	Offline evaluation system
L3 Reliable	Reliable operations	SLOs, guardrails, and fallback automation	Joint cost/quality optimization
L4 Adaptive	Supervised adaptation	Drift detection, policy tuning, automated recovery	Change evidence and approval logs retained

Go-Live Gates

Gate	Example Pass Criteria
Quality gate	Core task success rate >= 95%
Safety gate	Policy violation rate <= 0.2%
Performance gate	p95 latency within budget
Cost gate	Unit cost within budget +5%

LLMOps and AgentOps in Production

Recently Updated Chapters

Core Operating Formulas

Operating Maturity Model

Go-Live Gates

Operating Loop

Contents

Ch1. System Architecture

Ch2. Versioning and Release

Ch3. Evaluation Framework

Ch4. Online Guardrails

Ch5. Observability and SLOs

Ch6. Cost and Latency

Ch7. Experiment Operations

Ch8. Incident Management

Appendix. Verification Report

Appendix. Updates

On This Page

LLMOps and AgentOps in Production

Recently Updated Chapters

Core Operating Formulas

Operating Maturity Model

Go-Live Gates

Operating Loop

Contents

Ch1. System Architecture

Ch2. Versioning and Release

Ch3. Evaluation Framework

Ch4. Online Guardrails

Ch5. Observability and SLOs

Ch6. Cost and Latency

Ch7. Experiment Operations

Ch8. Incident Management

Appendix. Verification Report

Appendix. Updates

On This Page