Installation

Install grepvec with a single command:

$ curl -fsSL https://grepvec.com/install.sh | sh

Or build from source:

$ git clone https://github.com/grepvec/grepvec.git $ cd grepvec $ cargo build --release

Requirements: Rust 1.75+, Docker (for local mode).

The install script places the grepvec binary in ~/.grepvec/bin and adds it to your PATH. It also installs grepvec-ui for visualization.

Quick Start

Get code intelligence for any project in four commands:

$ cd your-project $ grepvec init # provisions Docker Postgres + Qdrant + BGE $ grepvec search "how does auth work" $ grepvec context "validate_signature" $ grepvec read "validate_signature"

grepvec init does everything: starts Docker containers, runs database migrations, parses your codebase with tree-sitter, generates biographies, and embeds them for neural search. After init, search works immediately.

Configuration

grepvec uses two configuration files:

Local mode credentials

# ~/.grepvec/credentials [postgres] url = "postgresql://grepvec:grepvec@localhost:5432/grepvec" [vector] backend = "local" qdrant_url = "http://localhost:6333" bge_url = "http://localhost:8080"

Managed mode credentials (Enscribe)

# ~/.grepvec/credentials [postgres] url = "postgresql://..." [vector] backend = "enscribe" [enscribe] api_key = "ensk_..." base_url = "http://localhost:3000"

Scope file

The scope file tells grepvec which repositories to parse and track. It is generated by grepvec init and can be edited manually:

# .grepvec/scope.toml [[repos]] name = "my-project" path = "." last_sha = "a1b2c3d"

The Research Loop

The core workflow is three commands: search, context, read. Each one narrows the focus. The pattern eliminates context window pollution — your agent (or you) gets exactly the code that matters.

Start with a natural language question. grepvec runs a two-pass search: keyword (tsvector) then neural (vector embeddings). Results are structurally ranked — not just text matches.

$ grepvec search "how does authentication work" --- Pass 1: Keyword Search --- 1. function api::hmac_auth::validate_signature (src/api/hmac_auth.rs) -- 42 LOC Called by: authenticate_request, verify_lab_signature 2. function api::hmac_auth::authenticate_request (src/api/hmac_auth.rs) -- 28 LOC Calls: validate_signature, extract_headers --- Pass 2: Neural Search --- 1. [0.72] api::hmac_auth::validate_signature Validates HMAC-SHA256 signature for incoming API requests 2. [0.58] api::hmac_auth::HmacAuthError enum with variants: InvalidTimestamp, InvalidSignature, MissingHeader

You now know which items are relevant. Next, understand how they connect.

context — Understand connections

Given an item name, context returns its biography and graph neighborhood: callers, callees, parent, dependencies. This is the structural context that grep cannot provide.

$ grepvec context "validate_signature" function api::hmac_auth::validate_signature src/api/hmac_auth.rs:45-86 (42 LOC) Validates an HMAC-SHA256 signature against a request body and timestamp. Rejects requests with timestamps older than 5 minutes. Used as the primary authentication mechanism for all API endpoints. Calls (3): -> hmac::Mac::verify (unresolved) -> hex::decode (unresolved) -> chrono::Utc::now (unresolved) Called by (2): <- authenticate_request (src/api/routes.rs:7) <- verify_lab_signature (src/api/rest.rs:35) Parent: module api::hmac_auth

Now you know what the function does and how it connects. Time to read the source.

read — Get the source

Extracts the exact source code for an item — no more, no less. Your agent's context window gets the precise function body, not an entire 500-line file.

$ grepvec read "validate_signature" function api::hmac_auth::validate_signature (src/api/hmac_auth.rs:45-86) 45 pub fn validate_signature( 46 secret: &[u8], 47 timestamp: &str, 48 body: &[u8], 49 signature: &str, 50 ) -> Result<(), HmacAuthError> { 51 let ts: i64 = timestamp.parse() 52 .map_err(|_| HmacAuthError::InvalidTimestamp)?; 53 let now = Utc::now().timestamp(); 54 if (now - ts).abs() > 300 { 55 return Err(HmacAuthError::InvalidTimestamp); 56 } .. ...

Three commands, zero noise. The agent has exactly the context it needs to reason correctly.


Commands Reference

Every grepvec command is a subcommand of the grepvec binary. Run grepvec --help for a full listing.

init

grepvec init [--repos <paths>] [--db-url <url>] [--local] [--managed] [--enscribe-key <key>] [--config-only]

Provisions infrastructure, parses your codebase, and gets search working in one command. Creates Docker containers (local mode), runs database migrations, absorbs code, generates biographies, and embeds them for neural search.

Flags
  • --repos <paths> — Comma-separated paths to repositories to include
  • --db-url <url> — Use an existing Postgres database instead of Docker
  • --local — Use local Docker containers for Postgres, Qdrant, and BGE (default)
  • --managed — Use Enscribe managed backend for vector search
  • --enscribe-key <key> — API key for Enscribe managed backend
  • --config-only — Write config files without parsing or embedding
$ grepvec init --local Starting grepvec-db (Postgres 16)... done Starting grepvec-qdrant (Qdrant)... done Starting grepvec-bge (BGE-large-en-v1.5).. done Running migrations... done Absorbing 3 repos (847 files)... done (2.1s) Generating biographies... done (1.4s) Embedding biographies... done (8.2s) Ready. Run `grepvec search` to get started.

refresh

grepvec refresh [--scope <path>]

Session-start hook. Incrementally absorbs changes since the last parse run and refreshes stale biographies. Instant when nothing has changed.

Flags
  • --scope <path> — Path to scope.toml (defaults to .grepvec/scope.toml)
$ grepvec refresh Checking 3 repos for changes... my-project: 2 files changed since a1b2c3d Absorbed 2 files, 14 items updated Refreshed 8 stale biographies Refresh complete (0.49s)

absorb

grepvec absorb [--repo <name>] [--all] [--changed-since <sha>] [--dry-run] [--stats] [--migrate]

Parses source code with tree-sitter and stores the extracted items and edges in Postgres. The deterministic foundation of the structural graph.

Flags
  • --repo <name> — Absorb only a specific repository
  • --all — Absorb all repositories in scope
  • --changed-since <sha> — Only parse files changed since the given commit SHA
  • --dry-run — Parse but do not write to the database
  • --stats — Print summary statistics after absorption
  • --migrate — Run database migrations before absorbing
$ grepvec absorb --all --stats Absorbing 4 repos... enscribe-embed: 1,842 items, 3,201 edges (228 files) enscribe-developer: 980 items, 1,544 edges (112 files) enscribe-observe: 812 items, 1,340 edges (87 files) enscribe-cli: 836 items, 2,500 edges (94 files) Total: 4,470 items, 8,585 edges

document

grepvec document [--repo <name>] [--all] [--dry-run] [--sample <n>] [--stale-only]

Generates deterministic biographies for each item in the graph. Biographies include callers, callees, dependencies, parent module, and a structural summary. Stored in the annotations table.

Flags
  • --repo <name> — Generate biographies for a specific repository
  • --all — Generate for all repositories in scope
  • --dry-run — Generate but do not write to the database
  • --sample <n> — Generate biographies for a random sample of n items (useful for testing)
  • --stale-only — Only regenerate biographies marked as stale
$ grepvec document --all --stale-only Generating biographies for stale items... 142 stale items found 142 biographies generated Done (0.8s)

context

grepvec context <name> [--repo <name>] [--hops <n>]

Returns the biography and graph neighborhood for an item. Shows callers, callees, parent module, and structural relationships up to N hops away. This is the command that makes grep unnecessary for understanding code.

Flags
  • <name> — Item name (qualified or unqualified, e.g., validate_signature or api::hmac_auth::validate_signature)
  • --repo <name> — Disambiguate when the same name exists in multiple repos
  • --hops <n> — Graph neighborhood depth (default: 1)
$ grepvec context "validate_signature" --hops 2 function api::hmac_auth::validate_signature src/api/hmac_auth.rs:45-86 (42 LOC) Calls (3): -> hmac::Mac::verify (unresolved) -> hex::decode (unresolved) -> chrono::Utc::now (unresolved) Called by (2): <- authenticate_request (src/api/routes.rs:7) <- handle_api_request (src/api/server.rs:42) <- verify_lab_signature (src/api/rest.rs:35) <- process_lab_webhook (src/api/rest.rs:10)

read

grepvec read <name> [--repo <name>] [-C <n>]

Extracts the exact source code for an item from disk. Returns only the function/struct/enum body with line numbers — not the entire file. Keeps the context window clean.

Flags
  • <name> — Item name (qualified or unqualified)
  • --repo <name> — Disambiguate when the same name exists in multiple repos
  • -C <n> — Lines of context around the item (default: 0)
$ grepvec read "HmacAuthError" enum api::hmac_auth::HmacAuthError (src/api/hmac_auth.rs:12-18) 12 pub enum HmacAuthError { 13 InvalidTimestamp, 14 InvalidSignature, 15 MissingHeader(String), 16 ExpiredRequest, 17 MalformedPayload(String), 18 }

reconcile

grepvec reconcile --edges [--dry-run] [--report]

Resolves cross-repo edges. When a function in repo A calls a function in repo B, the initial parse marks that edge as unresolved. Reconcile attempts to match these edges to items in other parsed repositories.

Flags
  • --edges — Reconcile edges (required)
  • --dry-run — Report what would be reconciled without writing
  • --report — Print detailed reconciliation report
$ grepvec reconcile --edges --report Reconciling cross-repo edges... 4,451 unresolved edges examined 312 newly resolved (cross-repo matches) 4,139 remaining unresolved (external dependencies) Resolution rate: 47.7% (4,093 / 8,585)

boundary

grepvec boundary <gaps|list|create|resolve>

Manages inferred boundary nodes — representations of external systems your code depends on but you do not own. Boundary nodes are created by agent reasoning over unresolved edges.

Subcommands
  • gaps — Show unresolved edges grouped by external crate/module
  • list — List all existing boundary nodes with their metadata
  • create --from <json> — Create a boundary node from a JSON definition
  • resolve — Link unresolved edges to existing boundary nodes
$ grepvec boundary gaps Unresolved edges by external crate: qdrant_client (142 edges) -- vector database client tonic (98 edges) -- gRPC framework reqwest (64 edges) -- HTTP client tokio (201 edges) -- async runtime serde (340 edges) -- serialization

embed

grepvec embed [--collection-id <id>] [--repo <name>] [--init] [--boundary-nodes] [--dry-run]

Bulk-embeds biographies into the vector backend for neural search. Works with both local (BGE + Qdrant) and managed (Enscribe) backends.

Flags
  • --collection-id <id> — Target collection ID
  • --repo <name> — Embed biographies for a specific repository only
  • --init — Create the collection, configure the voice, and embed in one step
  • --boundary-nodes — Include boundary node descriptions in the embedding
  • --dry-run — Count items that would be embedded without writing
$ grepvec embed --init Creating collection 'grepvec-platform'... done Configuring voice 'biography-search'... done Embedding 2,209 biographies... done (6.4s) Embedding 12 boundary nodes... done (0.2s) Collection ready for neural search.

remember

grepvec remember <write|recall> [--lane <session|project|knowledge>] [--kind <decision|summary|error|trace>]

Agent memory system. Stores observations, decisions, and session summaries for persistence across agent sessions. Memories are stored in the vector backend and can be recalled by semantic query.

Subcommands
  • write — Store a memory (reads from stdin)
  • recall — Recall memories by semantic query
Flags
  • --lane <lane> — Memory lane: session (ephemeral), project (per-project), knowledge (permanent)
  • --kind <kind> — Memory kind: decision, summary, error, trace
$ echo "Auth uses HMAC-SHA256 with 5-min expiry" | \ grepvec remember write --lane project --kind decision Stored memory (project/decision) $ grepvec remember recall "how does auth work" 1. [project/decision] Auth uses HMAC-SHA256 with 5-min expiry 2. [session/summary] Reviewed hmac_auth module, all tests passing

mcp-server

grepvec mcp-server

Starts the MCP (Model Context Protocol) server. Runs as a stdio JSON-RPC server that exposes grepvec commands as tools to any MCP-compatible AI agent. No flags — configuration comes from grepvec's config files.

$ grepvec mcp-server MCP server listening on stdio Tools: search, context, read, refresh, absorb, boundary, remember

Agent Integration

grepvec provides two discovery mechanisms for AI agents: MCP (tools appear natively in the agent's tool palette) and subagent instructions (a markdown file the agent reads on session start). Both work. MCP is lower friction; subagent instructions give you more control.

MCP Server

The MCP server exposes grepvec commands as native tools via the Model Context Protocol. Any MCP-compatible agent discovers grepvec tools automatically.

Add to your project's .mcp.json:

{ "mcpServers": { "grepvec": { "command": "/path/to/grepvec", "args": ["mcp-server"] } } }

The MCP server exposes these tools:

ToolMaps toDescription
searchgrepvec searchTwo-pass structural search
contextgrepvec contextBiography + graph neighborhood
readgrepvec readPrecise source extraction
refreshgrepvec refreshIncremental absorb
boundarygrepvec boundaryExternal dependency analysis
remembergrepvec rememberPersistent agent memory

Subagent Instructions

For agents that do not support MCP, grepvec generates a markdown instruction file at .grepvec/agent.md. This file teaches the agent how to invoke grepvec via shell commands. Point your agent's system prompt or instruction file to it.

The instruction file includes:

Claude Code

Claude Code discovers grepvec via MCP. Add the .mcp.json to your project root and grepvec tools appear in Claude Code's tool palette. Alternatively, symlink the agent instructions:

$ mkdir -p .claude/agents $ ln -s ../../.grepvec/agent.md .claude/agents/grepvec.md

Claude Code will automatically discover and use the subagent for code intelligence queries.

Cursor

Cursor supports MCP natively. Add the .mcp.json configuration and grepvec tools appear in the Composer tool list. You can also add the agent instructions to Cursor's rules:

$ cp .grepvec/agent.md .cursor/rules/grepvec.md

Codex

OpenAI Codex agents can use grepvec via shell commands. Include the agent instructions in your Codex system prompt or reference the instruction file:

$ cp .grepvec/agent.md codex-instructions/grepvec.md

The agent instructions are tool-agnostic — they work with any agent that can invoke shell commands.


Vector Backend

grepvec supports two vector backends for neural search. Keyword search (Pass 1) works with just Postgres and requires no vector backend at all.

Local (Docker)

The default. grepvec init --local starts three Docker containers:

ContainerImagePortVolumePurpose
grepvec-dbpostgres:16-alpine5432grepvec-pgdataStructural graph storage
grepvec-qdrantqdrant/qdrant:latest6333grepvec-qdrant-dataVector similarity search
grepvec-bgeghcr.io/huggingface/tei:latest8080grepvec-bge-cacheBGE-large-en-v1.5 embeddings

All data persists in Docker volumes. Stop and restart containers freely — nothing is lost. The BGE model is downloaded and cached on first run (about 1.3 GB).

Why BGE-large-en-v1.5? In our evaluation campaign (100 queries, 6 categories, graded relevance), BGE-large achieved NDCG@10 of 0.79 and MRR of 0.89 — measurably better than OpenAI and Voyage embeddings, and it runs locally for free.

Enscribe (Managed)

For teams that want zero infrastructure management. Enscribe handles embedding generation, vector storage, and search. You provide an API key; Enscribe handles the rest.

$ grepvec init --managed --enscribe-key "ensk_..."

Enscribe provides collections, voices (search profiles), and neural search ranking. grepvec is a consumer of Enscribe's API.

Configuration

FeatureLocal (Docker)Enscribe (Managed)
SetupOne command, Docker requiredAPI key, no infra
Embedding modelBGE-large-en-v1.5Configurable
CostFree (your hardware)$5/seat + metered
LatencyLocal networkNetwork round-trip
Data locationYour machineCloud
Search voicesDefault voiceCustom voice profiles
Agent memoryLocal QdrantManaged lanes

Both backends are interchangeable. Switch between them by changing ~/.grepvec/credentials. The structural graph in Postgres is independent of the vector backend.


Visualization

grepvec-ui

A 3D force-directed graph visualization of your structural inventory, built with egui and backed by Postgres. Renders both deterministic items and inferred boundary nodes.

Launch directly or via hotkey:

$ grepvec-ui

The layout uses ForceAtlas2 — a force-directed algorithm that clusters related code items and pushes unrelated items apart. Heavily connected modules form visible clusters.

Controls

ActionControl
OrbitClick and drag
ZoomScroll wheel
Select areaArea-select (drag box) to drill into a cluster
Inspect nodeClick a node to view its biography and edges
QuitPress Escape twice

Color Modes

Two coloring modes reveal different aspects of your codebase:

Layer mode — Colors by architectural tier. API handlers, storage layer, domain logic, and utilities each get a distinct color. Reveals the layered structure of your codebase.

IO mode — Colors by behavioral role. Items that read external data, write external data, do pure computation, or coordinate other items. Reveals data flow patterns.

Node shapes encode item type:

ShapeItem Type
CircleFunction
RectangleStruct
DiamondEnum
HexagonImpl block
Ringed circleTrait

Architecture

How Parsing Works

grepvec uses tree-sitter to parse source code into a concrete syntax tree, then walks that tree to extract structural items and their relationships.

Items extracted: functions, structs, enums, traits, impl blocks, modules, constants, type aliases. Each item gets a qualified name (e.g., api::hmac_auth::validate_signature), file location, and line range.

Edges extracted: calls (function A calls function B), implements (struct implements trait), contains (module contains function), imports (file imports name). Edges are initially resolved within the same file, then within the same repo.

Parsing is deterministic: same code, same graph, every time. No AI, no heuristics, no drift.

Language support: The core parsing engine works with any language that has a tree-sitter grammar. Language-specific norms (qualified naming, module structure) are layered on top. Rust is fully supported today; other languages are straightforward to add.

The Structural Graph

The parsed items and edges form a graph stored in Postgres. This graph has two layers:

Deterministic layer — Produced by tree-sitter. Items and resolved edges. Same code, same graph. This layer is ground truth.

Inferred layer — Boundary nodes created by agent reasoning over unresolved edges. Each boundary node represents an external dependency (e.g., qdrant_client v1.12) with metadata: APIs used, configuration, failure impact, confidence score.

Deterministic Layer (tree-sitter, identical on every run) validate_signature --> hmac::Mac::verify validate_signature --> hex::decode authenticate_request --> validate_signature ^^ resolved (same repo) Inferred Layer (agent-reasoned, confidence-scored) [hmac v0.12] <-- validate_signature APIs: Mac::new_from_slice, Mac::verify Role: cryptographic signature validation Confidence: 0.95

The unresolved edges are the bridge between layers. They belong to the deterministic layer (tree-sitter found them) but point at the inferred layer (the agent resolves them to boundary nodes).

Biography Generation

Each item in the graph gets a biography — a deterministic text summary of its structural role. Biographies are generated from the graph, not from AI. They include:

Biographies are stored in the annotations table and serve dual purpose: they are the text that keyword search (tsvector) indexes, and they are the text that gets embedded for neural search.

Search uses two passes to combine precision with recall:

Pass 1: Keyword (tsvector) — Postgres full-text search over biographies. Fast, structural, no network calls. Finds items whose biographies contain the query terms. Good for exact names and specific technical terms.

Pass 2: Neural (vector embeddings) — The query is embedded with the same model used for biographies (BGE-large-en-v1.5 in local mode), then vector similarity search finds semantically related items. Good for natural language questions and conceptual queries.

"how does authentication work" | +-- Pass 1: tsvector on biographies (Postgres) | Returns: items matching "authentication", "auth", "work" | Speed: ~5ms | +-- Pass 2: vector similarity (Qdrant/Enscribe) Returns: semantically similar items Speed: ~50ms | +-- Merged, deduplicated, ranked

If the vector backend is unavailable, Pass 1 still works. grepvec degrades gracefully — you lose semantic search but retain structural keyword search.

Stale item cleanup: Postgres is always 1:1 with the codebase. When a function is deleted from source, the next absorb run removes it from the graph along with its edges and biography. No orphans, no drift.


Elastic License 2.0 · GitHub · crates.io

Built with Rust and tree-sitter.