Data Protection

Key takeaways

Data protection starts by mapping what an AI system can access and where prompts, outputs, traces, and embeddings move.
Classify data into tiers from P0 Restricted (secrets, regulated identifiers; block or require exception) down to P3 Public with normal integrity controls.
Minimize what prompts and retrieval pipelines receive, and separate training, inference, evaluation, and logging data paths.
Define retention and deletion for prompts, outputs, traces, and embeddings, and mask P0/P1 data in logs and agent outputs.
Test the system by asking what sensitive data an exported transcript, embedding, trace, or tool log would contain.

AI systems often combine product data, user content, logs, embeddings, and tool outputs. Data protection starts by knowing what the system can access and where that data moves.

Data Classes

Class	Examples	Required posture
P0 Restricted	Secrets, credentials, regulated identifiers	Block or require explicit exception
P1 Sensitive	Customer content, internal documents, support data	Strict access, retention, and logging rules
P2 Internal	Product analytics, operational metadata	Controlled use and monitoring
P3 Public	Published docs, marketing pages	Normal integrity controls

Control Checklist

Minimize what prompts, tools, and retrieval pipelines receive.
Separate training, inference, evaluation, and logging data paths.
Define retention and deletion for prompts, outputs, traces, and embeddings.
Encrypt sensitive data at rest and in transit.
Mask secrets and P0/P1 data in logs and agent outputs.
Document cross-border, vendor, and subprocessor exposure.

Review Question

If a transcript, embedding, trace, or tool log were exported, what sensitive data would it contain?

Key takeaways

Data protection starts by mapping what an AI system can access and where prompts, outputs, traces, and embeddings move.
Classify data into tiers from P0 Restricted (secrets, regulated identifiers; block or require exception) down to P3 Public with normal integrity controls.
Minimize what prompts and retrieval pipelines receive, and separate training, inference, evaluation, and logging data paths.
Define retention and deletion for prompts, outputs, traces, and embeddings, and mask P0/P1 data in logs and agent outputs.
Test the system by asking what sensitive data an exported transcript, embedding, trace, or tool log would contain.

AI systems often combine product data, user content, logs, embeddings, and tool outputs. Data protection starts by knowing what the system can access and where that data moves.

Data Classes

Class	Examples	Required posture
P0 Restricted	Secrets, credentials, regulated identifiers	Block or require explicit exception
P1 Sensitive	Customer content, internal documents, support data	Strict access, retention, and logging rules
P2 Internal	Product analytics, operational metadata	Controlled use and monitoring
P3 Public	Published docs, marketing pages	Normal integrity controls

Control Checklist

Minimize what prompts, tools, and retrieval pipelines receive.
Separate training, inference, evaluation, and logging data paths.
Define retention and deletion for prompts, outputs, traces, and embeddings.
Encrypt sensitive data at rest and in transit.
Mask secrets and P0/P1 data in logs and agent outputs.
Document cross-border, vendor, and subprocessor exposure.

Review Question

If a transcript, embedding, trace, or tool log were exported, what sensitive data would it contain?

Data Classes

Control Checklist

Review Question

On This Page

Data Protection

Data Classes

Control Checklist

Review Question

On This Page