Agent-Native Documentation

Why Human-First Documentation Fails Agents

Traditional documentation is designed for biological consumption. It is characterized by:

Fractured Navigation: Content split across hundreds of nested HTML pages.
Visual Noise: Heavy CSS, JavaScript trackers, and DOM elements (headers, footers, sidebars) that consume token budgets without adding value.
Context Fragmentation: Internal linking structures that force agents to "browse" and lose state.

Moltext bridges this gap by transforming chaotic, human-centric web docs into a single, high-density, deterministic context.md file optimized for LLM context windows and vector retrieval.

The Agentic Ingestion Pipeline

Moltext follows a three-stage pipeline to ensure documentation is "Agent-Ready":

Crawl: Recursively parses a target domain, staying within the specified depth and domain boundaries.
Normalize: Strips non-content elements (nav, scripts, footers) and converts HTML to clean, structural Markdown.
Compress (Optional): Uses an LLM to remove conversational filler and "marketing speak," focusing strictly on API signatures, logic, and technical constraints.

Technical Interface

Moltext is primarily used via its CLI, serving as a "Skill" for autonomous agents or a pre-processing tool for developers.

Basic Compilation

To compile a documentation site into a single context file:

moltext https://docs.example.com --output tool_context.md

Raw Mode vs. LLM Enhancement

Moltext provides two distinct strategies for processing documentation:

1. Raw Mode (`--raw`)

Recommended for maximum deterministic accuracy. It performs structural normalization without an LLM.

Pros: Zero cost, fast, no API key required, 100% faithful to source text.
Use Case: High-fidelity API references where every character counts.

moltext https://docs.example.com --raw

2. LLM-Enhanced Mode

Uses a processing model (e.g., gpt-4o-mini or a local Llama model) to compress the documentation.

Pros: Lower token usage in your final agent prompt, removes "noise," optimizes for RAG.
Use Case: Large documentation suites with significant "how-to" filler.

moltext https://docs.example.com --key YOUR_API_KEY --model gpt-4o-mini

CLI Configuration Reference

Local Inference Integration

For air-gapped or cost-sensitive workflows, Moltext supports local inference servers (Ollama, LM Studio, vLLM) via the OpenAI-compatible endpoint flag:

moltext https://docs.example.com \
  --base-url http://localhost:11434/v1 \
  --model llama3

Output Structure

The resulting context.md is structured for immediate ingestion into an agent's memory or a vector database:

# Documentation Context
Compiled by Moltext from [URL] at [ISO-Timestamp]

---

## Source: [Page Title](https://docs.example.com/api-ref)

[High-density technical content, code blocks, and signatures]

---

This format ensures that the agent understands the provenance of the information while maintaining a linear, easily searchable stream of technical truth.