Raw vs. LLM Modes
Processing Modes: Raw vs. LLM
Moltext offers two distinct modes for documentation compilation. Choosing the right mode depends on your requirements for processing speed, token economy, and the specific "intelligence" level of the consuming agent.
1. Raw Mode (--raw)
Raw mode is the high-performance, structural ingestion engine. It transforms HTML into clean Markdown using a deterministic parsing pipeline without the use of an LLM.
- Mechanism: Moltext strips navigation, footers, scripts, and styles using
cheerio, then converts the remaining semantic content to Markdown viaTurndown. - Best For:
- High-speed ingestion of massive documentation sets.
- Offline use or environments without API access.
- Agents with large context windows that prefer "unfiltered" ground truth.
- Key Advantage: Zero cost and no API key required.
Usage:
moltext https://docs.example.com --raw
2. LLM Mode (Default)
LLM mode acts as a "Documentation Compiler." After the initial raw parse, Moltext passes the content through an LLM (OpenAI or local) to compress and optimize the text for agentic logic.
- Mechanism: It uses a specialized system prompt to remove "conversational filler" (e.g., "In this guide, you will learn..."), fixes broken markdown formatting, and optimizes the structure for vector retrieval.
- Best For:
- Minimizing token usage in agent context windows.
- Converting "chatty" human-centric blogs into high-density technical specs.
- RAG (Retrieval-Augmented Generation) systems where keyword density and clarity are paramount.
- Key Advantage: Extremely high-density, deterministic context that strips noise while strictly preserving code signatures.
Usage (OpenAI):
moltext https://docs.example.com -k your_openai_key
Usage (Local Inference/Ollama):
moltext https://docs.example.com \
--base-url http://localhost:11434/v1 \
--model llama3
Comparison Matrix
| Feature | Raw Mode (-r) | LLM Mode (Default) |
| :--- | :--- | :--- |
| Speed | Instantaneous | Batch-processed (approx. 5 pages/batch) |
| Cost | Free | Token-based (or local compute) |
| Auth | No API Key needed | Requires API Key or Local Endpoint |
| Output Type | Structural Markdown | Compressed Agentic Context |
| Noise Reduction | Removes UI/HTML noise | Removes UI noise + linguistic filler |
| Code Integrity | High | Absolute (Strictly Preserved) |
Optimization Logic
In LLM Mode, the processor is configured with a temperature: 0.1 and a strict system role. This ensures that the compiler does not hallucinate new features but instead "distills" existing documentation into its most potent form. If the LLM call fails for any reason, Moltext gracefully falls back to the Raw output for that specific page, ensuring your context.md is never empty.