Using Local Models
Local Inference Support
Moltext is designed to be model-agnostic. While it defaults to OpenAI, you can point the compiler to any OpenAI-compatible API endpoint. This is ideal for maintaining data privacy, avoiding API costs, or running high-volume compilations without rate limits.
Why Use Local Models?
- Privacy: Keep documentation scraping and processing entirely on your local machine.
- No Cost: Process thousands of pages without incurring token costs.
- Offline Mode: Compile context for your agents even without an active internet connection (provided the documentation is mirrored or locally accessible).
Connecting to Ollama
Ollama provides an OpenAI-compatible header for its local API. To use Ollama with Moltext, ensure your Ollama server is running and specify the local endpoint and model.
moltext https://docs.example.com \
--base-url http://localhost:11434/v1 \
--model llama3
Connecting to LM Studio
LM Studio allows you to host a local server for GGUF models.
- Open LM Studio and start the Local Server.
- Select your model and ensure the server is active (defaulting to port
1234). - Run Moltext:
moltext https://docs.example.com \
--base-url http://localhost:1234/v1 \
--model <your-loaded-model-identifier>
Configuration Parameters
To use a local model, you primarily interact with two flags:
--base-url, -u <url>: The endpoint of your local inference server.- Note: Moltext detects if the URL is not
api.openai.com. If a local URL is detected, the requirement for an API key is automatically bypassed (adummy-keyis injected internally to satisfy the SDK).
- Note: Moltext detects if the URL is not
--model, -m <model>: The exact string identifier of the model you have loaded in your local environment (e.g.,mistral,phi3,llama3:8b).
Technical Logic for Local Auth
When using local inference, Moltext simplifies the authentication flow:
| Scenario | API Key Requirement | Internal Behavior |
| :--- | :--- | :--- |
| OpenAI (Default) | Required via -k or OPENAI_API_KEY | Direct connection to OpenAI. |
| Local Inference | Optional | If --base-url is custom, Moltext defaults to a placeholder key. |
| Raw Mode (-r) | Not Required | Skips LLM processing entirely. |
Optimization Tip for Local LLMs
Local models may vary in their ability to follow complex system prompts. Moltext uses a strict system prompt to ensure "deterministic" and "agent-readable" output. If you find your local model is adding conversational filler, ensure you are using a model capable of following system instructions (e.g., Instruct or Chat variants).