Installation
Install grepvec with a single command:
Or build from source:
Requirements: Rust 1.75+, Docker (for local mode).
The install script places the grepvec binary in ~/.grepvec/bin and adds it to your PATH. It also installs grepvec-ui for visualization.
Quick Start
Get code intelligence for any project in four commands:
grepvec init does everything: starts Docker containers, runs database migrations, parses your codebase with tree-sitter, generates biographies, and embeds them for neural search. After init, search works immediately.
Configuration
grepvec uses two configuration files:
- ~/.grepvec/credentials — database and vector backend credentials (global)
- .grepvec/scope.toml — which repositories are in scope (per-project)
Local mode credentials
Managed mode credentials (Enscribe)
Scope file
The scope file tells grepvec which repositories to parse and track. It is generated by grepvec init and can be edited manually:
The Research Loop
The core workflow is three commands: search, context, read. Each one narrows the focus. The pattern eliminates context window pollution — your agent (or you) gets exactly the code that matters.
search — Find what matters
Start with a natural language question. grepvec runs a two-pass search: keyword (tsvector) then neural (vector embeddings). Results are structurally ranked — not just text matches.
You now know which items are relevant. Next, understand how they connect.
context — Understand connections
Given an item name, context returns its biography and graph neighborhood: callers, callees, parent, dependencies. This is the structural context that grep cannot provide.
Now you know what the function does and how it connects. Time to read the source.
read — Get the source
Extracts the exact source code for an item — no more, no less. Your agent's context window gets the precise function body, not an entire 500-line file.
Three commands, zero noise. The agent has exactly the context it needs to reason correctly.
Commands Reference
Every grepvec command is a subcommand of the grepvec binary. Run grepvec --help for a full listing.
init
Provisions infrastructure, parses your codebase, and gets search working in one command. Creates Docker containers (local mode), runs database migrations, absorbs code, generates biographies, and embeds them for neural search.
--repos <paths>— Comma-separated paths to repositories to include--db-url <url>— Use an existing Postgres database instead of Docker--local— Use local Docker containers for Postgres, Qdrant, and BGE (default)--managed— Use Enscribe managed backend for vector search--enscribe-key <key>— API key for Enscribe managed backend--config-only— Write config files without parsing or embedding
refresh
Session-start hook. Incrementally absorbs changes since the last parse run and refreshes stale biographies. Instant when nothing has changed.
--scope <path>— Path to scope.toml (defaults to.grepvec/scope.toml)
absorb
Parses source code with tree-sitter and stores the extracted items and edges in Postgres. The deterministic foundation of the structural graph.
--repo <name>— Absorb only a specific repository--all— Absorb all repositories in scope--changed-since <sha>— Only parse files changed since the given commit SHA--dry-run— Parse but do not write to the database--stats— Print summary statistics after absorption--migrate— Run database migrations before absorbing
document
Generates deterministic biographies for each item in the graph. Biographies include callers, callees, dependencies, parent module, and a structural summary. Stored in the annotations table.
--repo <name>— Generate biographies for a specific repository--all— Generate for all repositories in scope--dry-run— Generate but do not write to the database--sample <n>— Generate biographies for a random sample of n items (useful for testing)--stale-only— Only regenerate biographies marked as stale
search
Two-pass search across the structural graph. Pass 1 uses Postgres tsvector keyword matching on biographies. Pass 2 uses vector embeddings for neural/semantic ranking. Natural language queries work out of the box.
<query>— Natural language search query--repo <name>— Restrict search to a specific repository--limit <n>— Maximum results per pass (default: 10)--exact— Exact match mode (no fuzzy matching)--neural— Force neural search even if backend is not configured--no-neural— Skip neural pass, keyword only--collection-id <id>— Specify collection for neural search
context
Returns the biography and graph neighborhood for an item. Shows callers, callees, parent module, and structural relationships up to N hops away. This is the command that makes grep unnecessary for understanding code.
<name>— Item name (qualified or unqualified, e.g.,validate_signatureorapi::hmac_auth::validate_signature)--repo <name>— Disambiguate when the same name exists in multiple repos--hops <n>— Graph neighborhood depth (default: 1)
read
Extracts the exact source code for an item from disk. Returns only the function/struct/enum body with line numbers — not the entire file. Keeps the context window clean.
<name>— Item name (qualified or unqualified)--repo <name>— Disambiguate when the same name exists in multiple repos-C <n>— Lines of context around the item (default: 0)
reconcile
Resolves cross-repo edges. When a function in repo A calls a function in repo B, the initial parse marks that edge as unresolved. Reconcile attempts to match these edges to items in other parsed repositories.
--edges— Reconcile edges (required)--dry-run— Report what would be reconciled without writing--report— Print detailed reconciliation report
boundary
Manages inferred boundary nodes — representations of external systems your code depends on but you do not own. Boundary nodes are created by agent reasoning over unresolved edges.
gaps— Show unresolved edges grouped by external crate/modulelist— List all existing boundary nodes with their metadatacreate --from <json>— Create a boundary node from a JSON definitionresolve— Link unresolved edges to existing boundary nodes
embed
Bulk-embeds biographies into the vector backend for neural search. Works with both local (BGE + Qdrant) and managed (Enscribe) backends.
--collection-id <id>— Target collection ID--repo <name>— Embed biographies for a specific repository only--init— Create the collection, configure the voice, and embed in one step--boundary-nodes— Include boundary node descriptions in the embedding--dry-run— Count items that would be embedded without writing
remember
Agent memory system. Stores observations, decisions, and session summaries for persistence across agent sessions. Memories are stored in the vector backend and can be recalled by semantic query.
write— Store a memory (reads from stdin)recall— Recall memories by semantic query
--lane <lane>— Memory lane:session(ephemeral),project(per-project),knowledge(permanent)--kind <kind>— Memory kind:decision,summary,error,trace
mcp-server
Starts the MCP (Model Context Protocol) server. Runs as a stdio JSON-RPC server that exposes grepvec commands as tools to any MCP-compatible AI agent. No flags — configuration comes from grepvec's config files.
Agent Integration
grepvec provides two discovery mechanisms for AI agents: MCP (tools appear natively in the agent's tool palette) and subagent instructions (a markdown file the agent reads on session start). Both work. MCP is lower friction; subagent instructions give you more control.
MCP Server
The MCP server exposes grepvec commands as native tools via the Model Context Protocol. Any MCP-compatible agent discovers grepvec tools automatically.
Add to your project's .mcp.json:
The MCP server exposes these tools:
| Tool | Maps to | Description |
|---|---|---|
search | grepvec search | Two-pass structural search |
context | grepvec context | Biography + graph neighborhood |
read | grepvec read | Precise source extraction |
refresh | grepvec refresh | Incremental absorb |
boundary | grepvec boundary | External dependency analysis |
remember | grepvec remember | Persistent agent memory |
Subagent Instructions
For agents that do not support MCP, grepvec generates a markdown instruction file at .grepvec/agent.md. This file teaches the agent how to invoke grepvec via shell commands. Point your agent's system prompt or instruction file to it.
The instruction file includes:
- Available commands and their flags
- The research loop pattern (search, context, read)
- When to call refresh (session start)
- How to use remember for persistent context
Claude Code
Claude Code discovers grepvec via MCP. Add the .mcp.json to your project root and grepvec tools appear in Claude Code's tool palette. Alternatively, symlink the agent instructions:
Claude Code will automatically discover and use the subagent for code intelligence queries.
Cursor
Cursor supports MCP natively. Add the .mcp.json configuration and grepvec tools appear in the Composer tool list. You can also add the agent instructions to Cursor's rules:
Codex
OpenAI Codex agents can use grepvec via shell commands. Include the agent instructions in your Codex system prompt or reference the instruction file:
The agent instructions are tool-agnostic — they work with any agent that can invoke shell commands.
Vector Backend
grepvec supports two vector backends for neural search. Keyword search (Pass 1) works with just Postgres and requires no vector backend at all.
Local (Docker)
The default. grepvec init --local starts three Docker containers:
| Container | Image | Port | Volume | Purpose |
|---|---|---|---|---|
grepvec-db | postgres:16-alpine | 5432 | grepvec-pgdata | Structural graph storage |
grepvec-qdrant | qdrant/qdrant:latest | 6333 | grepvec-qdrant-data | Vector similarity search |
grepvec-bge | ghcr.io/huggingface/tei:latest | 8080 | grepvec-bge-cache | BGE-large-en-v1.5 embeddings |
All data persists in Docker volumes. Stop and restart containers freely — nothing is lost. The BGE model is downloaded and cached on first run (about 1.3 GB).
Why BGE-large-en-v1.5? In our evaluation campaign (100 queries, 6 categories, graded relevance), BGE-large achieved NDCG@10 of 0.79 and MRR of 0.89 — measurably better than OpenAI and Voyage embeddings, and it runs locally for free.
Enscribe (Managed)
For teams that want zero infrastructure management. Enscribe handles embedding generation, vector storage, and search. You provide an API key; Enscribe handles the rest.
Enscribe provides collections, voices (search profiles), and neural search ranking. grepvec is a consumer of Enscribe's API.
Configuration
| Feature | Local (Docker) | Enscribe (Managed) |
|---|---|---|
| Setup | One command, Docker required | API key, no infra |
| Embedding model | BGE-large-en-v1.5 | Configurable |
| Cost | Free (your hardware) | $5/seat + metered |
| Latency | Local network | Network round-trip |
| Data location | Your machine | Cloud |
| Search voices | Default voice | Custom voice profiles |
| Agent memory | Local Qdrant | Managed lanes |
Both backends are interchangeable. Switch between them by changing ~/.grepvec/credentials. The structural graph in Postgres is independent of the vector backend.
Visualization
grepvec-ui
A 3D force-directed graph visualization of your structural inventory, built with egui and backed by Postgres. Renders both deterministic items and inferred boundary nodes.
Launch directly or via hotkey:
The layout uses ForceAtlas2 — a force-directed algorithm that clusters related code items and pushes unrelated items apart. Heavily connected modules form visible clusters.
Controls
| Action | Control |
|---|---|
| Orbit | Click and drag |
| Zoom | Scroll wheel |
| Select area | Area-select (drag box) to drill into a cluster |
| Inspect node | Click a node to view its biography and edges |
| Quit | Press Escape twice |
Color Modes
Two coloring modes reveal different aspects of your codebase:
Layer mode — Colors by architectural tier. API handlers, storage layer, domain logic, and utilities each get a distinct color. Reveals the layered structure of your codebase.
IO mode — Colors by behavioral role. Items that read external data, write external data, do pure computation, or coordinate other items. Reveals data flow patterns.
Node shapes encode item type:
| Shape | Item Type |
|---|---|
| Circle | Function |
| Rectangle | Struct |
| Diamond | Enum |
| Hexagon | Impl block |
| Ringed circle | Trait |
Architecture
How Parsing Works
grepvec uses tree-sitter to parse source code into a concrete syntax tree, then walks that tree to extract structural items and their relationships.
Items extracted: functions, structs, enums, traits, impl blocks, modules, constants, type aliases. Each item gets a qualified name (e.g., api::hmac_auth::validate_signature), file location, and line range.
Edges extracted: calls (function A calls function B), implements (struct implements trait), contains (module contains function), imports (file imports name). Edges are initially resolved within the same file, then within the same repo.
Parsing is deterministic: same code, same graph, every time. No AI, no heuristics, no drift.
Language support: The core parsing engine works with any language that has a tree-sitter grammar. Language-specific norms (qualified naming, module structure) are layered on top. Rust is fully supported today; other languages are straightforward to add.
The Structural Graph
The parsed items and edges form a graph stored in Postgres. This graph has two layers:
Deterministic layer — Produced by tree-sitter. Items and resolved edges. Same code, same graph. This layer is ground truth.
Inferred layer — Boundary nodes created by agent reasoning over unresolved edges. Each boundary node represents an external dependency (e.g., qdrant_client v1.12) with metadata: APIs used, configuration, failure impact, confidence score.
The unresolved edges are the bridge between layers. They belong to the deterministic layer (tree-sitter found them) but point at the inferred layer (the agent resolves them to boundary nodes).
Biography Generation
Each item in the graph gets a biography — a deterministic text summary of its structural role. Biographies are generated from the graph, not from AI. They include:
- Item name, type, location, line count
- What it calls (callees)
- What calls it (callers)
- External dependencies it touches
- Methods it contains (for structs/impls)
- Parent module and containment hierarchy
Biographies are stored in the annotations table and serve dual purpose: they are the text that keyword search (tsvector) indexes, and they are the text that gets embedded for neural search.
Two-Pass Search
Search uses two passes to combine precision with recall:
Pass 1: Keyword (tsvector) — Postgres full-text search over biographies. Fast, structural, no network calls. Finds items whose biographies contain the query terms. Good for exact names and specific technical terms.
Pass 2: Neural (vector embeddings) — The query is embedded with the same model used for biographies (BGE-large-en-v1.5 in local mode), then vector similarity search finds semantically related items. Good for natural language questions and conceptual queries.
If the vector backend is unavailable, Pass 1 still works. grepvec degrades gracefully — you lose semantic search but retain structural keyword search.
Stale item cleanup: Postgres is always 1:1 with the codebase. When a function is deleted from source, the next absorb run removes it from the graph along with its edges and biography. No orphans, no drift.