MCP is everywhere now — and so is its oldest constraint. How a transparent caching proxy gets any MCP server past the 25,000-token response limit.
- #mcp
- #ai-infrastructure
- #caching
- #open-source
writing
Notes from production — AI systems, performance, and the infrastructure underneath.
MCP is everywhere now — and so is its oldest constraint. How a transparent caching proxy gets any MCP server past the 25,000-token response limit.
How a 24/7 AI agent fleet stays affordable on one subscription: deterministic code handles every tick, and the model only runs on real signals.
My fleet dashboard quietly degraded to 4.18s. The cause: one COUNT(*) full-scanning 258k rows on every load. One index later: ~18ms, flat forever.