Local Inference Setup
Local Inference Setup
Moltext is designed to be provider-agnostic. While it defaults to OpenAI, you can route the compilation process through local inference servers like Ollama or LM Studio. This is the preferred "Shared Brain" flow for users prioritizing privacy, local-first agentic workflows, or cost reduction.
Why Use Local Inference?
- Data Privacy: Documentation content never leaves your local machine.
- Zero Cost: Process massive documentation sets without incurring API tokens.
- Agent Sovereignty: Keep your agent's learning infrastructure entirely self-hosted.
Configuration Overview
To use a local provider, you must override the default OpenAI endpoint and specify your local model name using the following flags:
| Flag | Description | Default |
| :--- | :--- | :--- |
| -u, --base-url | The URL of your local inference server (OpenAI-compatible) | https://api.openai.com/v1 |
| -m, --model | The specific model tag/name installed on your local server | gpt-4o-mini |
[!NOTE] When a custom
--base-urlis detected that does not point to OpenAI, Moltext automatically bypasses the API key requirement by injecting a dummy key. You do not need to provide the-kflag for local setups.
Provider Examples
1. Ollama
Ollama provides an OpenAI-compatible API on port 11434.
- Ensure Ollama is running and you have pulled a model (e.g.,
llama3ormistral). - Run Moltext pointing to the local endpoint:
moltext https://docs.example.com \
--base-url http://localhost:11434/v1 \
--model llama3
2. LM Studio
LM Studio allows you to host local LLMs with a one-click local server.
- Open LM Studio and start the Local Server (typically on port
1234). - Ensure the "Cross-Origin Resource Sharing (CORS)" is enabled if applicable.
- Run Moltext:
moltext https://docs.example.com \
--base-url http://localhost:1234/v1 \
--model <your-loaded-model-id>
Advanced Usage: The Hybrid "Raw" Fallback
If your local model is struggling with complex HTML-to-Markdown normalization, you can use the --raw flag to skip LLM processing entirely. This bypasses all inference and uses Moltext's internal deterministic parsing engine.
moltext https://docs.example.com --raw
Troubleshooting Local Connections
- Connection Refused: Ensure your local server is explicitly running the API service and that the port matches your
--base-url. - Model Not Found: Verify the model name using
ollama listor the LM Studio dashboard. The string passed to--modelmust match the provider's identifier exactly. - Timeouts: Compiling large documentation sites locally depends on your hardware. For massive sites, consider increasing the
--limitgradually.