What are the key points?

Mnemo is a new Rust-based local memory layer for LLMs that uses SQLite and petgraph. The system extracts entities and builds a persistent knowledge graph to inject context into prompts. It operates without cloud dependencies and performs full retrieval pipelines in approximately 4.2 ms.

Mnemo Launches as Local-First AI Memory Layer

•Mnemo is a new Rust-based local memory layer for LLMs that uses SQLite and petgraph.
•The system extracts entities and builds a persistent knowledge graph to inject context into prompts.
•It operates without cloud dependencies and performs full retrieval pipelines in approximately 4.2 ms.

Mnemo, a new local-first memory layer for large language models (LLMs), has been released as an open-source tool written in Rust. The project functions as a sidecar service that extracts entities, maintains a persistent knowledge graph in SQLite, and injects scored, relevant context into LLM prompts without requiring a cloud connection or Python runtime. Designed for developers building custom pipelines, the system processes conversations or documents by sending raw text to an LLM for entity and relationship extraction. Once ingested, these entities are deduplicated and linked within a petgraph structure that supports multi-hop traversal at query time.

The retrieval process follows a six-stage pipeline including full-text chunk search, entity name matching, and breadth-first search (BFS) graph expansion to rank and assemble context strings in under 50ms. Users can interact with the system via a REST API, a command-line interface (CLI), or a Python SDK. It provides native support for any OpenAI-compatible API, including locally hosted models via Ollama. Performance metrics measured on an Apple M2 processor indicate that the full retrieval pipeline operates at approximately 4.2 ms per request, yielding a throughput of roughly 238 operations per second.

The repository includes comprehensive support for 12 benchmark suites and 122 automated tests. Configuration is managed through environment variables or TOML files, allowing users to specify the LLM provider, base URL, and model. Mnemo is licensed under the MIT license and is currently available on GitHub for deployment via Docker or direct binary installation. The architecture is modular, composed of four primary Rust crates: mnemo-core for core engine logic, mnemo-api for REST handlers, mnemo-cli for terminal operations, and mnemo-bench for hardware-specific performance testing.

Mnemo, a new local-first memory layer for large language models (LLMs), has been released as an open-source tool written in Rust. The project functions as a sidecar service that extracts entities, maintains a persistent knowledge graph in SQLite, and injects scored, relevant context into LLM prompts without requiring a cloud connection or Python runtime. Designed for developers building custom pipelines, the system processes conversations or documents by sending raw text to an LLM for entity and relationship extraction. Once ingested, these entities are deduplicated and linked within a petgraph structure that supports multi-hop traversal at query time.

The retrieval process follows a six-stage pipeline including full-text chunk search, entity name matching, and breadth-first search (BFS) graph expansion to rank and assemble context strings in under 50ms. Users can interact with the system via a REST API, a command-line interface (CLI), or a Python SDK. It provides native support for any OpenAI-compatible API, including locally hosted models via Ollama. Performance metrics measured on an Apple M2 processor indicate that the full retrieval pipeline operates at approximately 4.2 ms per request, yielding a throughput of roughly 238 operations per second.

The repository includes comprehensive support for 12 benchmark suites and 122 automated tests. Configuration is managed through environment variables or TOML files, allowing users to specify the LLM provider, base URL, and model. Mnemo is licensed under the MIT license and is currently available on GitHub for deployment via Docker or direct binary installation. The architecture is modular, composed of four primary Rust crates: mnemo-core for core engine logic, mnemo-api for REST handlers, mnemo-cli for terminal operations, and mnemo-bench for hardware-specific performance testing.