Using Local Models

Local Inference Support

Moltext is designed to be model-agnostic. While it defaults to OpenAI, you can point the compiler to any OpenAI-compatible API endpoint. This is ideal for maintaining data privacy, avoiding API costs, or running high-volume compilations without rate limits.

Why Use Local Models?

Privacy: Keep documentation scraping and processing entirely on your local machine.
No Cost: Process thousands of pages without incurring token costs.
Offline Mode: Compile context for your agents even without an active internet connection (provided the documentation is mirrored or locally accessible).

Connecting to Ollama

Ollama provides an OpenAI-compatible header for its local API. To use Ollama with Moltext, ensure your Ollama server is running and specify the local endpoint and model.

moltext https://docs.example.com \
  --base-url http://localhost:11434/v1 \
  --model llama3

Connecting to LM Studio

LM Studio allows you to host a local server for GGUF models.

Open LM Studio and start the Local Server.
Select your model and ensure the server is active (defaulting to port 1234).
Run Moltext:

moltext https://docs.example.com \
  --base-url http://localhost:1234/v1 \
  --model <your-loaded-model-identifier>

Configuration Parameters

To use a local model, you primarily interact with two flags:

--base-url, -u <url>: The endpoint of your local inference server.
- Note: Moltext detects if the URL is not api.openai.com. If a local URL is detected, the requirement for an API key is automatically bypassed (a dummy-key is injected internally to satisfy the SDK).
--model, -m <model>: The exact string identifier of the model you have loaded in your local environment (e.g., mistral, phi3, llama3:8b).

Technical Logic for Local Auth

When using local inference, Moltext simplifies the authentication flow:

Optimization Tip for Local LLMs

Local models may vary in their ability to follow complex system prompts. Moltext uses a strict system prompt to ensure "deterministic" and "agent-readable" output. If you find your local model is adding conversational filler, ensure you are using a model capable of following system instructions (e.g., Instruct or Chat variants).