Swapnil Surdi
I build production AI systems — RAG pipelines, agentic fleets, and the backend
infrastructure that keeps them fast, cheap, and reliable.
[email protected] · github · linkedin
request rag · inference traces · metrics · logs client gateway api postgres queue workers cache embed vector db bm25 rerank llm claude gpt gemini watchdog alerts client api rag llm traces · metrics · logs −30–50%
LLM cost (mcp-cache)
github · swapnilsurdi
16 repos · 9 stars · 698 contributions/yr
may s m t w t f s 2026-05-01: 0 contributions 2026-05-02: 0 contributions 2026-05-03: 0 contributions 2026-05-04: 3 contributions 2026-05-05: 16 contributions 2026-05-06: 0 contributions 2026-05-07: 0 contributions 2026-05-08: 4 contributions 2026-05-09: 7 contributions 2026-05-10: 4 contributions 2026-05-11: 14 contributions 2026-05-12: 15 contributions 2026-05-13: 8 contributions 2026-05-14: 5 contributions 2026-05-15: 4 contributions 2026-05-16: 6 contributions 2026-05-17: 6 contributions 2026-05-18: 8 contributions 2026-05-19: 40 contributions 2026-05-20: 12 contributions 2026-05-21: 4 contributions 2026-05-22: 25 contributions 2026-05-23: 0 contributions 2026-05-24: 1 contribution 2026-05-25: 23 contributions 2026-05-26: 4 contributions 2026-05-27: 12 contributions 2026-05-28: 19 contributions 2026-05-29: 12 contributions 2026-05-30: 11 contributions 2026-05-31: 36 contributions jun s m t w t f s 2026-06-01: 18 contributions 2026-06-02: 28 contributions 2026-06-03: 17 contributions 2026-06-04: 11 contributions 2026-06-05: 33 contributions 2026-06-06: 14 contributions 2026-06-07: 3 contributions 2026-06-08: 2 contributions 2026-06-09: 1 contribution 2026-06-10: 1 contribution 2026-06-11 2026-06-12 2026-06-13 2026-06-14 2026-06-15 2026-06-16 2026-06-17 2026-06-18 2026-06-19 2026-06-20 2026-06-21 2026-06-22 2026-06-23 2026-06-24 2026-06-25 2026-06-26 2026-06-27 2026-06-28 2026-06-29 2026-06-30 less more
claude code · this machine peak 467m
5.3b tokens total · ~146m/day (30d avg)
may s m t w t f s 2026-05-01: 1,397,858 tokens 2026-05-02: no activity 2026-05-03: no activity 2026-05-04: 35,733,883 tokens 2026-05-05: 12,358,174 tokens 2026-05-06: no activity 2026-05-07: no activity 2026-05-08: 32,943,538 tokens 2026-05-09: 310,103,454 tokens 2026-05-10: 380,237,558 tokens 2026-05-11: 138,700,920 tokens 2026-05-12: 29,992,332 tokens 2026-05-13: 38,959,876 tokens 2026-05-14: 46,230,135 tokens 2026-05-15: 46,173,627 tokens 2026-05-16: 4,947,316 tokens 2026-05-17: 5,625,430 tokens 2026-05-18: 191,318,015 tokens 2026-05-19: 260,879,517 tokens 2026-05-20: 70,397,308 tokens 2026-05-21: 24,518,188 tokens 2026-05-22: 158,798,428 tokens 2026-05-23: 53,917,295 tokens 2026-05-24: 55,627,671 tokens 2026-05-25: 219,370,163 tokens 2026-05-26: 124,184,958 tokens 2026-05-27: 311,607,762 tokens 2026-05-28: 466,697,725 tokens 2026-05-29: 234,821,816 tokens 2026-05-30: 222,506,163 tokens 2026-05-31: 333,351,372 tokens jun s m t w t f s 2026-06-01: 248,734,105 tokens 2026-06-02: 323,497,818 tokens 2026-06-03: 122,616,939 tokens 2026-06-04: 115,710,613 tokens 2026-06-05: 269,559,487 tokens 2026-06-06: 79,369,337 tokens 2026-06-07: 157,012,781 tokens 2026-06-08: 40,876,417 tokens 2026-06-09: 59,038,968 tokens 2026-06-10: 54,454,806 tokens 2026-06-11 2026-06-12 2026-06-13 2026-06-14 2026-06-15 2026-06-16 2026-06-17 2026-06-18 2026-06-19 2026-06-20 2026-06-21 2026-06-22 2026-06-23 2026-06-24 2026-06-25 2026-06-26 2026-06-27 2026-06-28 2026-06-29 2026-06-30 less more
as of 10 jun 2026
▣ live · 3 nodes · 22 containers
Three recycled laptops, each operated by its own headless Claude Code agent: a private 22-container homelab that monitors, heals, and reports on itself.
288 watchdog runs/day, zero tokens 4.18s → 18ms status query 22 containers ▣ npm · @hapus/mcp-cache · ★9
A transparent proxy that caches oversized MCP tool responses and hands the model query tools — so any MCP server works past the 25K-token wall.
25K → unlimited token wall −30–50% LLM API cost <200ms cached query ▣ production · HIPAA · 4 yrs
Production agentic RAG over docs, code, Confluence, and Jira for a HIPAA/ISO 13485 platform — compliance retrieval 30s → sub-second, verification 60% faster.
30s → <1s compliance retrieval 60% faster verification MCP is everywhere now — and so is its oldest constraint. How a transparent caching proxy gets any MCP server past the 25,000-token response limit.
#mcp #ai-infrastructure #caching #open-source How a 24/7 AI agent fleet stays affordable on one subscription: deterministic code handles every tick, and the model only runs on real signals.
#ai-agents #automation #llmops #self-hosting My fleet dashboard quietly degraded to 4.18s. The cause: one COUNT(*) full-scanning 258k rows on every load. One index later: ~18ms, flat forever.
#sqlite #performance #go #war-story
Looking for the full picture — roles, stack, and the numbers behind the work?
View resume →