Project Roadmap
πΊοΈ Project Roadmap
Moltext is evolving from a single-URL compiler into a universal ingestion engine for agentic workflows. Our roadmap focuses on expanding the data sources available to agents and improving the structural density of the output.
π Phase 1: Expanded Format Support
While HTML is the primary medium for documentation, critical technical specifications often live in static files.
- PDF Ingestion: Support for parsing and normalizing PDF manuals, whitepapers, and API specifications into the
context.mdformat. - Markdown Native Support: Direct ingestion of existing
.mdand.mdxrepositories (e.g., GitHubdocs/folders) to bypass scraping overhead. - OpenAPI/Swagger Integration: Specialized processing for JSON/YAML spec files to generate high-density API reference tables.
π Phase 2: Structural Intelligence
Improving how Moltext handles complex, fragmented information across ecosystems.
- Multi-Domain Documentation Mapping: Currently, the crawler stays within a single domain. Future updates will allow "Trusted Domain" lists, enabling agents to follow documentation that spans multiple sites (e.g., a core library site and its associated plugin ecosystem).
- Cross-Reference Normalization: Automatically resolving and rewriting relative links within the
context.mdto maintain internal consistency when the agent navigates the compiled memory. - Sitemap-Aware Crawling: Faster discovery of documentation structures by prioritizing
sitemap.xmlandrobots.txtpaths.
π Phase 3: Agent-Centric Optimizations
Features designed to lower latency and improve retrieval-augmented generation (RAG) performance.
- Context Window Chunking: Automatic splitting of
context.mdinto model-optimized chunks (e.g., 32k or 128k blocks) with overlapping headers to prevent context loss. - Local Embedding Generation: An optional flag to output a
.vectoror.jsonlfile alongside the markdown, ready for immediate injection into vector databases like Chroma or Pinecone. - Incremental Updates: A "watch" mode that re-compiles only the pages that have changed since the last run, reducing LLM token consumption.
To suggest a feature or report a bug, please open an issue in the GitHub Repository.