Data Protection
Classify, minimize, encrypt, retain, and govern data used by AI systems.
Key takeaways
- Data protection starts by mapping what an AI system can access and where prompts, outputs, traces, and embeddings move.
- Classify data into tiers from P0 Restricted (secrets, regulated identifiers; block or require exception) down to P3 Public with normal integrity controls.
- Minimize what prompts and retrieval pipelines receive, and separate training, inference, evaluation, and logging data paths.
- Define retention and deletion for prompts, outputs, traces, and embeddings, and mask P0/P1 data in logs and agent outputs.
- Test the system by asking what sensitive data an exported transcript, embedding, trace, or tool log would contain.
AI systems often combine product data, user content, logs, embeddings, and tool outputs. Data protection starts by knowing what the system can access and where that data moves.
Data Classes
| Class | Examples | Required posture |
|---|---|---|
| P0 Restricted | Secrets, credentials, regulated identifiers | Block or require explicit exception |
| P1 Sensitive | Customer content, internal documents, support data | Strict access, retention, and logging rules |
| P2 Internal | Product analytics, operational metadata | Controlled use and monitoring |
| P3 Public | Published docs, marketing pages | Normal integrity controls |
Control Checklist
- Minimize what prompts, tools, and retrieval pipelines receive.
- Separate training, inference, evaluation, and logging data paths.
- Define retention and deletion for prompts, outputs, traces, and embeddings.
- Encrypt sensitive data at rest and in transit.
- Mask secrets and P0/P1 data in logs and agent outputs.
- Document cross-border, vendor, and subprocessor exposure.
Review Question
If a transcript, embedding, trace, or tool log were exported, what sensitive data would it contain?