resume

Swapnil Surdi

Download PDF

AI engineer & backend engineer — production RAG, agentic systems, LLMOps; 10 years building software.

San Jose, California · [email protected] · LinkedIn · GitHub · Google Scholar

01 — skills

Skills

AI/ML Systems: LLM integration · RAG pipelines · agentic workflows · MCP (Model Context Protocol) · vector databases (Qdrant, ChromaDB) · embeddings & retrieval strategies · LLM evaluation · prompt engineering · LangChain, LlamaIndex
LLM Models: Claude (Anthropic) · ChatGPT (OpenAI) · Gemini (Google) · Llama (Meta) · Qwen (Alibaba)
Languages: Python (primary) · Go · TypeScript/JavaScript
Backend & APIs: FastAPI · Node.js · REST · GraphQL · async processing · production inference · Kafka, RabbitMQ, SQS
Cloud & Infrastructure: AWS (Lambda, Fargate, ECS, S3, DynamoDB, API Gateway) · GCP (Cloud Run, Cloud SQL, Cloud Tasks) · Docker · Kubernetes · Terraform · production observability
Databases: PostgreSQL · MongoDB · Redis · DynamoDB · SQLite
Compliance: HIPAA · GDPR · ISO 13485 · IEC 62304

02 — experience

Experience

Senior Software Engineer (AI Systems Focus)

Treatment Technologies & Insights · Los Angeles, CA · Sept 2021 – July 2025

Architected a production agentic RAG system with MCP tools enabling multi-hop reasoning across documentation, codebase, Confluence, and Jira using 768-dimensional Qdrant embeddings — accelerated compliance verification workflows by 60% and cut documentation retrieval from 30 s to sub-second.
Built the end-to-end RAG pipeline behind it: vector database optimization, retrieval strategy tuning (hybrid BM25, similarity thresholds), and prompt engineering for multi-step reasoning over compliance documents, with retrieval quality validated on NDCG/MRR.
Designed and deployed an LLM evaluation framework comparing ChatGPT, Claude, Gemini, Llama, and Qwen for production workflows — trading off p50/p95 latency, cost-per-token, accuracy, and capability fit for healthcare compliance use cases.
Engineered a 0-to-1 HIPAA-compliant microservices platform on AWS handling 100K+ daily API requests at 99.9% uptime, with standardized error handling and distributed tracing across core services.
Developed FastAPI-based inference servers with async processing, batch optimization, and connection pooling for LLM request routing — handling rate limiting and failover across multiple providers (OpenAI, Anthropic, Google).
Implemented distributed error tracking with partial-UUID correlation across frontend, backend, and AI services — a centralized watchdog and real-time alerting cut mean time to resolution by 70%.
Architected a JWT-based multi-tenant configuration system enabling a single-deployment AI platform to serve multiple enterprise clients with isolated model configurations, reducing infrastructure costs by 60%.
Reduced MRI image load times from 8 minutes to 30 seconds via CDN-based delivery and an indexed image viewer on S3, with geography-aware caching respecting GDPR data residency.
Cut patient-list API response time from 5+ seconds to under 1 second through database indexing and caching strategy.
Mentored junior and mid-level engineers; introduced pair programming, TDD practices, and AI coding tool adoption across the team.

Independent Software Consultant

Self-employed · San Jose, CA / Remote · 2025 – Present

Building production AI features — RAG retrieval, agentic workflows, and LLM evaluation — for a consumer health platform.
Backend architecture and performance engineering for a payments platform in India designed for 200 transactions per second at peak.
Developing open-source AI infrastructure (MCP-Cache, email-mcp) and LaunchLab Fleet, a self-hosted multi-agent operations platform.

Application Developer

Apprely Technologies · Pune, India · Oct 2016 – June 2018

Built a financial platform with a Django/Python backend integrating Plaid, Stripe, and OCR — asynchronous processing with Kafka/RabbitMQ for real-time transaction handling.
Developed Android applications with REST API integration, handling authentication flows and multi-provider data aggregation.

Associate Software Engineer

Accenture · Pune, India · Oct 2015 – Oct 2016

Built automated testing frameworks in Python/Selenium, reducing test execution from weeks to hours — early production experience in quality validation and automation.

03 — selected projects

Selected Projects

LaunchLab Fleet — Self-hosted homelab and three-node AI agent fleet: a ~22-container Docker stack operated by headless Claude Code agents that coordinate over self-hosted Matrix, report to a custom Go + SQLite status hub, and remediate incidents from a safe-allowlist — deterministic checks every tick, model invocations only on real signals.
MCP-Cache — Transparent caching proxy that lets any MCP server return responses past client token limits; query interface over cached payloads (text/JSONPath/regex); cuts LLM API cost by 30–50%. Published on npm as @hapus/mcp-cache.
email-mcp — MCP server giving agents safe multi-account email: scoped per-agent keys, owner-approved sends, allow/block-lists as the last line of defense; powers unattended agent reporting in LaunchLab Fleet.
SmartContext — In development — context-aware multi-turn conversational AI using LlamaIndex, MCP tool orchestration, and dynamic response generation with Claude.

04 — education

Education

M.S., Astronautical Engineering — University of Southern California · 2018 – 2020
B.E., Electronics & Telecommunications — Savitribai Phule Pune University · 2011 – 2015

05 — patent & publications

Patent & Publications

Patent pending: US20240129000A1 — Fixed Base Station Antenna System using MIMO Configuration — directional + omnidirectional MIMO antenna design for disaster-resilient mesh networks (independent R&D). Details →
Publication: Space Situational Awareness through Blockchain Technology — Journal of Space Safety Engineering, 2020.

06 — certifications

Certifications

Lean Six Sigma Green Belt (USC Marshall)
HIPAA Compliance Certification
GDPR Data Security Certification