Knowledge Bases and Internal Wikis
Design KBs for RAG, MCP Resources, freshness, and citation-first answers.
Key takeaways
- KBs are core data sources for RAG and MCP Resources; structure decides whether agents retrieve the right doc and cite it or pull stale, irrelevant content.
- Keep docs atomic (one topic, one conclusion) so RAG retrieves targeted chunks instead of a single 100-page guide.
- Rich frontmatter (
id,status,owner,updated,review_after,source_of_truth) and chunking that carries doc ID and heading metadata enable retrieval and demoting of stale docs. - Use a citation-first answer format (answer, evidence with doc_id/section/updated, uncertainty, owner to confirm) and validate retrieval with expected/must-not-return eval sets.
- Resource descriptions help the model, but Resource text stays data; do not promote an external KB document to project instructions.
Knowledge bases are core data sources for RAG and MCP Resources. If the KB is well structured, agents can retrieve the right document and cite it. If not, they retrieve stale or irrelevant content.
Design Goals
| Goal | Meaning |
|---|---|
| Retrieval | title, description, tags, and headings help search |
| Citation | answers can point to doc IDs and sections |
| Freshness | stale docs are lowered or excluded |
| Trust boundary | external text is data, not instructions |
Atomic Documentation
One topic per document. One conclusion per evidence trail.
| Monolithic | Agent-optimized |
|---|---|
| 100-page "developer guide" | 20 focused docs |
| facts hidden in the middle | title/description identify scope |
| RAG retrieves too much | RAG retrieves targeted chunks |
| unclear source | doc ID and heading citation |
Frontmatter
---
id: kb-db-postgres-connection
title: PostgreSQL Connection Settings
description: DB connection settings for dev, staging, and production
category: infrastructure/database
tags: [postgresql, database, connection, environment]
status: current
owner: infra-team
created: 2026-01-15
updated: 2026-05-24
review_after: 2026-08-24
source_of_truth: /docs/env-variables
related:
- /docs/db-migration
- /docs/secrets
---Chunking
| Strategy | Recommendation |
|---|---|
| unit | split by H2/H3 when possible |
| metadata | carry doc ID, title, heading, status, updated |
| tables | keep heading and column meaning |
| code blocks | keep command and explanation together |
| stale docs | demote or exclude from retrieval |
MCP Resource URIs
kb://docs/db/postgres-connection
kb://runbooks/api-latency
kb://policies/refund-policy
schema://warehouse/orders
openapi://public-api/latestResource descriptions help the model, but Resource text is still data. Do not treat an external KB document as a source of project instructions.
Citation-first Answer Format
1. Answer
2. Evidence
- doc_id:
- section:
- updated:
3. Uncertainty
4. Owner to confirmRetrieval Eval
[
{
"question": "Where is the production DB connection string configured?",
"expected_docs": ["kb-db-postgres-connection"],
"must_not_return": ["kb-db-legacy-connection"]
}
]Checklist
| Item | Check |
|---|---|
| docs are atomic | [ ] |
id, status, owner, updated, review_after exist | [ ] |
| chunk metadata includes doc ID and heading | [ ] |
| stale docs are demoted or excluded | [ ] |
| answers cite sources | [ ] |
| MCP Resource URI maps to source URL | [ ] |
| external/user text is not promoted to instructions | [ ] |