The context.md Standard
Understanding the context.md Standard
The context.md file is the deterministic output format of the Moltext compiler. While traditional documentation is designed for human readability—prioritizing visual hierarchy, navigation, and stylistic filler—the context.md standard is optimized for Agentic Logic and Vector Retrieval.
By normalizing disparate HTML pages into a single, high-density Markdown file, Moltext creates a "Ground Truth" document that agents can ingest into their context window or RAG (Retrieval-Augmented Generation) pipelines without the noise of the modern web.
Anatomy of a context.md File
Every file generated by Moltext follows a specific structural pattern to ensure agents can parse it reliably.
1. The Global Header
At the top of the file, Moltext injects metadata regarding the compilation event. This allows agents to understand the provenance and "freshness" of the data.
# Documentation Context
Compiled by Moltext from https://docs.example.com at 2023-10-27T10:00:00.000Z
---
2. Source Encapsulation
Each crawled page is treated as an individual node within the file. Moltext wraps these in clear headers and separators to prevent context bleeding between different sections of the documentation.
## Source: [Page Title](https://docs.example.com/api/reference)
[High-density content here...]
---
Key Requirements for Agent-Native Memory
The context.md standard is governed by five core principles enforced during the compilation process:
- Deterministic Structure: Uses strict ATX-style headers (
#) and fenced code blocks (```). This ensures that simple regex or Markdown parsers used by agents can split the document accurately. - High Information Density: When running in non-raw mode, Moltext utilizes an LLM to strip "conversational filler" (e.g., "In this section, we will explore..."). What remains is pure technical substance: signatures, constraints, and logic.
- Technical Integrity: While prose is compressed, code blocks and API signatures are preserved with 100% fidelity. An agent must be able to copy-paste a signature from
context.mdand have it work in a production environment. - Noise Suppression: All UI-centric elements—navigation bars, footers, sidebars, and tracking scripts—are stripped during the cleaning phase before the Markdown is even generated.
- Keyword Optimization: The content is formatted to be "search-friendly" for vector databases, ensuring that semantic queries for specific functions or error codes return the most relevant snippets.
Usage in RAG and Agent Memory
The primary goal of a context.md file is to serve as a Memory Expansion Pack.
For Vector Databases (RAG)
Because the file is pre-cleaned and structured, you can chunk the context.md file by its --- delimiters. This creates clean, self-contained chunks that include the source URL, significantly improving the accuracy of citations in RAG-based chatbots.
For Long-Context LLMs
If you are using a model with a large context window (like GPT-4o or Claude 3.5 Sonnet), you can simply provide the entire context.md file as a system prompt attachment.
Agent Instruction Example:
"You are an expert on [Tool]. Use the attached
context.mdas your primary source of truth. Do not hallucinate signatures; refer strictly to the documentation provided in the file."
Example Output
A typical context.md snippet for a technical library looks like this:
## Source: [Authentication](https://api.docs.com/auth)
### Authentication Overview
All requests require a Bearer token in the `Authorization` header.
### Endpoints
`POST /v1/login`
- **Body**: `{ "email": "string", "password": "string" }`
- **Returns**: `200 OK` with `{ "token": "JWT_STRING" }`
### Error Codes
- `401`: Invalid credentials
- `403`: Account locked/Rate limited
---
By adhering to this standard, Moltext ensures that your autonomous agents spend less time "browsing" and more time "executing" based on accurate, high-density technical data.