Token economy: how I stopped wasting context windows with LLM memory
How loom-memory turns a Git repository into a persistent knowledge base for AI agents. Reduce token spend by giving models durable context instead of cold reads.
If you’ve used AI coding agents for anything beyond trivial tasks, you’ve seen the problem: every new session starts from scratch. The agent re-reads files it already analyzed, misses project conventions it learned yesterday, and burns tokens reconstructing context it should already have.
I built loom-memory to solve exactly this. The idea is simple: give AI agents durable memory so they stop paying the cold-start tax on every task.
The Problem: Forgetful Agents, Expensive Sessions
Most AI coding tools are stateless between sessions. Even within a single session, once the context window fills up, earlier analysis is lost. This means:
- Repeated file scanning The agent reads the same 50 files to understand the project structure every single time.
- Lost conventions Project-specific patterns, naming rules, and architectural decisions vanish between sessions.
- Token waste A significant chunk of every session is spent just re-establishing baseline understanding.
- Inconsistent output Without persistent memory, the agent might contradict decisions it made earlier.
On a real codebase, this isn’t theoretical. I was seeing 30-50% of tokens in every session go toward context reconstruction rather than actual work.
The Solution: A Living Knowledge Base
loom-memory turns a Git repository into a persistent, queryable knowledge base. After initialization, the repo contains:
_wiki/
00-Index.md
01-Architecture-Stack.md
02-Fonctionnalites-Actuelles.md
03-Regles-LLM.md
04-Code-Map.md
05-Call-Graph.md
_graph/
codebase.db # SQLite graph of files, imports, symbols
AGENTS.md # Project-specific agent instructions
docs/
decisions.md # Accumulated architectural decisions
pitfalls.md # Known gotchas and lessons learned
The core is a SQLite graph database built from static analysis:
- Indexed files with language metadata
- Exported symbols and import relationships
- Function call graphs (for TypeScript/JavaScript)
- Cross-language contract edges via annotations
- Local semantic search chunks for both code and wiki
MCP: Exposing Memory to Agents
The graph isn’t just for humans. loom-memory runs an MCP server that exposes repository knowledge directly to compatible AI tools:
| Tool | What it does |
|---|---|
find_symbol | Locate where a symbol is defined |
find_dependencies | What does this file import? |
find_dependents | Who imports this file? |
find_callers / find_callees | Function-level call graph queries |
hotspots | Most-connected files in the codebase |
cross_zone_deps | Cross-module dependency analysis |
recent_changes | What changed recently |
semantic_search | Natural language search over code and wiki |
recommend_execution_mode | How to approach a task Returns which files to read, reasoning level, and output mode (compact patch vs full code) |
That last one is the key to token economy. Instead of the agent guessing how to approach a task, recommend_execution_mode tells it:
- Which files to inspect first no more reading the entire
src/directory - What reasoning level is needed quick fix or deep architectural change?
- What output mode to use compact patch or full implementation?
The Recommended Workflow
User: "Add password reset email flow"
Agent → recommend_execution_mode("Add password reset email flow")
→ filesToInspect: ["src/auth/", "src/email/", "src/models/user.ts"]
→ reasoning: "medium"
→ outputMode: "compact_patch"
Agent → semantic_search("password reset token")
Agent → find_callers("sendEmail")
Agent → zoneSummary("src/auth/")
[Now the agent has context and only reads 5 files instead of 50]
Advise in Action
I tested the advise command on the anthropic-cookbook repo with a real task: “Add a new RAG pipeline using Pinecone”. Here is what it returned:
{
"task": "Add a new RAG pipeline using Pinecone",
"taskSize": "medium",
"risk": "low",
"recommendedReasoning": "medium",
"contextStrategy": "memory_first_then_targeted_reads",
"outputMode": "recipe",
"filesToInspect": [
"tool_use/context_engineering/research_corpus.py",
"tool_use/utils/visualize.py",
"managed_agents/cma-mcp/src/server-http.ts",
"claude_agent_sdk/site_reliability_agent/infra_setup.py",
"managed_agents/self_hosted_sandboxes/modal/modal_sandbox_webhook.py"
]
}
Instead of scanning the entire repo, the agent gets 5 specific files to inspect. The contextStrategy is memory_first_then_targeted_reads: use the graph and semantic search first, then read only the files that matter. The outputMode is recipe, meaning the agent should return a step-by-step plan rather than a full implementation.
Token Savings in Practice
I ran loom-memory’s built-in benchmark on two real projects to measure the actual token impact. The benchmark builds the SQLite graph, then compares reading every file cold versus retrieving only the relevant memory chunks.
Test 1: anthropic-cookbook (111 files, Python/TS)
A medium-sized repo with 667 symbols and 378 semantic search chunks.
| Metric | Cold Start | With loom-memory |
|---|---|---|
| Tokens to read all files | 223,780 | 20,598 (top 8 chunks) |
| Reduction | baseline | 90.8% |
| Symbol lookup accuracy | n/a | 100% (20/20 probes) |
| Files indexed | 111 | 111 |
| Search chunks available | n/a | 378 |
Test 2: next.js (20,685 files, TypeScript)
A large production codebase with 24,991 symbols and 40,558 search chunks.
| Metric | Cold Start | With loom-memory |
|---|---|---|
| Tokens to read all files | 18,137,779 | 1,653,818 (top 8 chunks) |
| Reduction | baseline | 90.9% |
| Symbol lookup accuracy | n/a | 100% (20/20 probes) |
| Files indexed | 20,685 | 20,685 |
| Search chunks available | n/a | 40,558 |
What this means
Both projects show the same pattern: loom-memory cuts the token cost of context acquisition by roughly 90%. The agent spends its tokens on doing the work instead of re-learning the project.
The key metric is the “understanding check”: loom-memory probes the semantic search index by picking 20 random symbols and verifying that a search for each symbol returns the file where it is defined. Both projects scored 100%. This means the memory layer reliably points the agent to the right files.
For next.js, the cold-read cost is 18 million tokens. With loom-memory, the agent retrieves 8 chunks totalling 1.6 million tokens and still finds every symbol it looks up. That is a 11x reduction in context cost while maintaining full accuracy on symbol resolution.
Keeping Memory Fresh
Memory is only useful if it stays current. loom-memory installs a post-commit hook that:
- Detects which zones changed
- Incrementally updates only the affected wiki sections
- Rebuilds the SQLite graph for changed files
- Updates
AGENTS.mdwith new conventions or pitfalls
The wiki is also section-level incremental. If you change the auth module, only the auth section of the architecture doc gets regenerated, not the entire wiki.
Current Status
loom-memory is an alpha prototype with a working CLI and MCP server. It supports:
- TypeScript/JavaScript (full AST parsing)
- Python, PHP, Ruby (Tree-sitter with regex fallbacks)
- Local semantic search with deterministic embeddings
- Paid-provider cost estimation for dry runs
- GitHub Actions workflow generation
What’s next: publishing to npm, testing on more real-world repositories, and scoping the roadmap from actual usage patterns.
Design Principles
- Local-first Repository knowledge lives with the repository, not in a cloud service.
- Stack agnostic Works across JS, Python, PHP, Ruby, and mixed codebases.
- Small-model friendly Compress repeated context into reusable maps and graph queries.
- Agent friendly Expose facts through MCP instead of forcing agents to guess.
- Human readable Generated memory is useful in a normal editor, not only through a tool.
- Self-improving Decisions and pitfalls accumulate as the codebase evolves.
Try It
git clone https://github.com/jguillaumesio/loom-memory
cd loom-memory
npm install
# Initialize a repository
node bin/cli.js init ./path/to/your/repo
# Check what it generated
node bin/cli.js status ./path/to/your/repo
# Ask it how to approach a task
node bin/cli.js advise ./path/to/your/repo "Add user notification preferences"
The project is open source and on GitHub: jguillaumesio/loom-memory