Raw vs. AI Processing

Moltext provides two distinct modes for transforming human-centric documentation into agentic context. Choosing the right mode depends on your performance requirements, token budget, and the intended use case for your Moltbot.

⚡️ Raw Mode (`--raw`)

Raw mode is a high-speed, structural transformation that bypasses the LLM layer. It focuses on normalizing "dirty" HTML into clean, deterministic Markdown.

How it works:

Clutter Removal: Strips navigation bars, footers, sidebars, scripts, and styles using structural heuristics.
HTML-to-MD Conversion: Converts the remaining main content into standard Markdown.
No-Latency Stream: Returns the output immediately without waiting for API inference.

When to use it:

Zero-Cost Ingestion: When you want to compile massive documentation sets without consuming LLM tokens.
Custom RAG Pipelines: If you are feeding the output into your own vector database or embedding model and want the full, uncompressed text.
Offline Environments: When you don't have an internet connection or an API key available.

# Compile documentation using only structural normalization
moltext https://docs.example.com --raw -o raw_context.md

🧠 AI Processing (Default)

AI Processing is the standard mode for creating "high-density" memories. It uses a secondary LLM pass to compress the documentation semantically, making it significantly more efficient for an agent to read.

How it works:

Normalization: Performs the same cleaning and conversion as Raw Mode.
Semantic Compression: An LLM (default: gpt-4o-mini) removes conversational filler ("In this guide, we will..."), repetitive intros, and marketing fluff.
Constraint Preservation: Explicitly protects code blocks, API signatures, and technical requirements.
Structural Optimization: Re-formats headers and logic flow to be optimized for LLM "eyes" and vector retrieval.

When to use it:

Context Window Optimization: When you need to fit a large library into a limited context window (e.g., GPT-4o or Claude 3.5 Sonnet).
Agent Learning: When the documentation is disorganized or contains heavy conversational "noise" that might cause agent hallucination.
API Reference Generation: To generate clean, strictly technical summaries of function signatures and parameters.

# Compile documentation with semantic compression
moltext https://docs.example.com --key sk-... --model gpt-4o-mini

Comparison Matrix

Usage Tip: The Hybrid Approach

For the best results, use Raw Mode for your permanent archival knowledge base (long-term memory) and AI Processing for the specific modules your agent is currently actively coding against (working memory).

Raw vs. AI Processing