Ch1. System Architecture

Separate the control plane and data plane to improve both reliability and change velocity

Key takeaways

Separate the control plane (policy, version registry, eval store) from the data plane (orchestrator, model/tool runtime, cache) to shrink blast radius while preserving iteration speed.
Distinguish tool connectivity (MCP) from agent-to-agent communication (A2A); the 2026 pattern converges on MCP for tool/data boundaries and A2A for delegation between independent agents.
Define trust boundaries per surface: hosted vs private MCP, and public vs extended A2A Agent Cards that hide internal URLs and secrets.
Manage MCP servers in a registry with owner, transport, token audience, scopes, allowed egress, and approval-required tools.
Default to safe behavior on failure: deny on policy-lookup failure, fall back to a lower-tier model, and use durable state and checkpoints for approval waits.

Many recurring LLM system failures happen because operational policy and execution logic are tangled together.
A cleaner architecture reduces blast radius while preserving the speed of iteration.

Recommended Structure

Separation Principles

Area	Control Plane	Data Plane
Responsibility	Policy, versions, evaluation criteria	Request handling and response generation
Change cadence	Weekly/monthly	Real-time/daily
Failure impact	Slower decisions	Direct user impact

2026 Agent Communication Protocols

In multi-agent systems, separate tool connectivity from agent-to-agent communication.

Protocol	Role	Current Baseline	Operating Point
MCP (Model Context Protocol)	Agent to tools/data	2025-11-25	JSON-RPC based. For HTTP transport, validate OAuth 2.1, Protected Resource Metadata, and token audience binding
A2A (Agent2Agent)	Agent to agent	latest v1.0.0	Provides tasks, streaming, push notifications, and extended Agent Cards. Avoid leaking resource existence before authentication
ACP (Agent Communication Protocol)	Agent to agent	Vendor ecosystem	REST/HTTP messaging. Validate security and interoperability before adopting it as an organizational standard

Convergence Pattern

The 2026 operating pattern is converging toward MCP for tool/data boundaries and A2A for delegation between independent agents. They are complementary trust boundaries, not substitutes.

Trust Boundary Design

Boundary	Operating Standard
Hosted/public MCP	Connect only public servers that fit the provider trust model; require approval for high-risk tools
Private/local MCP	Let the runtime own connectivity, filtering, approvals, scope limits, and network egress
A2A public Agent Card	Expose only public capabilities; exclude internal URLs, secrets, and detailed rate limits
A2A extended Agent Card	Serve only to authenticated clients and vary capabilities by client permission

MCP Server Registry Example

mcp_servers:
  - id: github-readonly-prod
    owner: platform-ai
    transport: streamable_http
    server_url: https://mcp.example.com/github
    token_audience: mcp://github-readonly-prod
    scopes:
      - repo:read
      - issue:read
    allowed_egress:
      - api.github.com
    approval_required_tools:
      - create_issue
      - write_file
    expires_at: '2026-06-16'

Design Checkpoints

Default to safe behavior when policy lookup fails: deny or restricted response.
Fall back to a lower-tier model when model routing fails.
Enforce transaction boundaries and idempotency keys around tool calls.
Keep a graceful degradation path when an MCP server is unavailable.
MCP servers should accept only tokens issued for themselves and must not pass those tokens through to upstream APIs.
Reject A2A push notification URLs that target private IPs, localhost, or link-local addresses.
Use durable state and checkpoints for approval waits, long-running work, and external event resumption.

Baseline and Sources

Item	Baseline Date	Recheck By	Primary Source
MCP 2025-11-25	2026-05-17	2026-06-16	https://modelcontextprotocol.io/specification/2025-11-25
A2A latest v1.0.0	2026-05-17	2026-06-16	https://a2a-protocol.org/latest/specification/
OpenAI Agents SDK MCP/tracing	2026-05-17	2026-06-16	https://developers.openai.com/api/docs/guides/agents/integrations-observability

Key takeaways

Separate the control plane (policy, version registry, eval store) from the data plane (orchestrator, model/tool runtime, cache) to shrink blast radius while preserving iteration speed.
Distinguish tool connectivity (MCP) from agent-to-agent communication (A2A); the 2026 pattern converges on MCP for tool/data boundaries and A2A for delegation between independent agents.
Define trust boundaries per surface: hosted vs private MCP, and public vs extended A2A Agent Cards that hide internal URLs and secrets.
Manage MCP servers in a registry with owner, transport, token audience, scopes, allowed egress, and approval-required tools.
Default to safe behavior on failure: deny on policy-lookup failure, fall back to a lower-tier model, and use durable state and checkpoints for approval waits.

Many recurring LLM system failures happen because operational policy and execution logic are tangled together.
A cleaner architecture reduces blast radius while preserving the speed of iteration.

Recommended Structure

Separation Principles

Area	Control Plane	Data Plane
Responsibility	Policy, versions, evaluation criteria	Request handling and response generation
Change cadence	Weekly/monthly	Real-time/daily
Failure impact	Slower decisions	Direct user impact

2026 Agent Communication Protocols

In multi-agent systems, separate tool connectivity from agent-to-agent communication.

Protocol	Role	Current Baseline	Operating Point
MCP (Model Context Protocol)	Agent to tools/data	2025-11-25	JSON-RPC based. For HTTP transport, validate OAuth 2.1, Protected Resource Metadata, and token audience binding
A2A (Agent2Agent)	Agent to agent	latest v1.0.0	Provides tasks, streaming, push notifications, and extended Agent Cards. Avoid leaking resource existence before authentication
ACP (Agent Communication Protocol)	Agent to agent	Vendor ecosystem	REST/HTTP messaging. Validate security and interoperability before adopting it as an organizational standard

Convergence Pattern

The 2026 operating pattern is converging toward MCP for tool/data boundaries and A2A for delegation between independent agents. They are complementary trust boundaries, not substitutes.

Trust Boundary Design

Boundary	Operating Standard
Hosted/public MCP	Connect only public servers that fit the provider trust model; require approval for high-risk tools
Private/local MCP	Let the runtime own connectivity, filtering, approvals, scope limits, and network egress
A2A public Agent Card	Expose only public capabilities; exclude internal URLs, secrets, and detailed rate limits
A2A extended Agent Card	Serve only to authenticated clients and vary capabilities by client permission

MCP Server Registry Example

mcp_servers:
  - id: github-readonly-prod
    owner: platform-ai
    transport: streamable_http
    server_url: https://mcp.example.com/github
    token_audience: mcp://github-readonly-prod
    scopes:
      - repo:read
      - issue:read
    allowed_egress:
      - api.github.com
    approval_required_tools:
      - create_issue
      - write_file
    expires_at: '2026-06-16'

Design Checkpoints

Default to safe behavior when policy lookup fails: deny or restricted response.
Fall back to a lower-tier model when model routing fails.
Enforce transaction boundaries and idempotency keys around tool calls.
Keep a graceful degradation path when an MCP server is unavailable.
MCP servers should accept only tokens issued for themselves and must not pass those tokens through to upstream APIs.
Reject A2A push notification URLs that target private IPs, localhost, or link-local addresses.
Use durable state and checkpoints for approval waits, long-running work, and external event resumption.

Baseline and Sources

Item	Baseline Date	Recheck By	Primary Source
MCP 2025-11-25	2026-05-17	2026-06-16	https://modelcontextprotocol.io/specification/2025-11-25
A2A latest v1.0.0	2026-05-17	2026-06-16	https://a2a-protocol.org/latest/specification/
OpenAI Agents SDK MCP/tracing	2026-05-17	2026-06-16	https://developers.openai.com/api/docs/guides/agents/integrations-observability

Recommended Structure

Separation Principles

2026 Agent Communication Protocols

Trust Boundary Design

MCP Server Registry Example

Design Checkpoints

Baseline and Sources

On This Page

Ch1. System Architecture

Recommended Structure

Separation Principles

2026 Agent Communication Protocols

Trust Boundary Design

MCP Server Registry Example

Design Checkpoints

Baseline and Sources

On This Page