Senior Software Engineer (AI Systems Focus)
Treatment Technologies & Insights · Los Angeles, CA · Sept 2021 – July 2025
- Architected a production agentic RAG system with MCP tools enabling multi-hop reasoning across documentation, codebase, Confluence, and Jira using 768-dimensional Qdrant embeddings — accelerated compliance verification workflows by 60% and cut documentation retrieval from 30 s to sub-second.
- Built the end-to-end RAG pipeline behind it: vector database optimization, retrieval strategy tuning (hybrid BM25, similarity thresholds), and prompt engineering for multi-step reasoning over compliance documents, with retrieval quality validated on NDCG/MRR.
- Designed and deployed an LLM evaluation framework comparing ChatGPT, Claude, Gemini, Llama, and Qwen for production workflows — trading off p50/p95 latency, cost-per-token, accuracy, and capability fit for healthcare compliance use cases.
- Engineered a 0-to-1 HIPAA-compliant microservices platform on AWS handling 100K+ daily API requests at 99.9% uptime, with standardized error handling and distributed tracing across core services.
- Developed FastAPI-based inference servers with async processing, batch optimization, and connection pooling for LLM request routing — handling rate limiting and failover across multiple providers (OpenAI, Anthropic, Google).
- Implemented distributed error tracking with partial-UUID correlation across frontend, backend, and AI services — a centralized watchdog and real-time alerting cut mean time to resolution by 70%.
- Architected a JWT-based multi-tenant configuration system enabling a single-deployment AI platform to serve multiple enterprise clients with isolated model configurations, reducing infrastructure costs by 60%.
- Reduced MRI image load times from 8 minutes to 30 seconds via CDN-based delivery and an indexed image viewer on S3, with geography-aware caching respecting GDPR data residency.
- Cut patient-list API response time from 5+ seconds to under 1 second through database indexing and caching strategy.
- Mentored junior and mid-level engineers; introduced pair programming, TDD practices, and AI coding tool adoption across the team.