About me
I'm an India-based AI engineer with close to a decade across the stack, now focused on production LLM systems, agents, evals, and tool design.
Not actively looking for roles, but open to exciting opportunities.
- Base
- India, working globally
- Current work
- Production AI systems
- Operating model
- Fractional CTO or embedded AI engineering lead
What's driven most of my career is wanting to build the whole thing. Not just the code, but the product, the system, and the team around it. That instinct is why I went broad before I went deep, why I started a studio, and why AI engineering is where I've been spending most of my time.
I came up as a full-stack engineer the slow way: shipping production code across frontend, backend, infra, and design before any of the leverage we have today existed. That grounding, in hindsight, calibrates everything I do now. You can tell when an agent is hallucinating a shape that wouldn't survive in production. You know which abstractions are load-bearing and which are decoration. You stop being impressed by demos.
My read on AI engineering has been the same for a while: the model is rarely the hard part. The hard part is the system around it: its context surface, its tool budget, the evals that catch regressions, the fallbacks that hold when a call fails, the cost-and-latency math that decides whether the feature is shippable at scale. Most production AI breaks in the wild not because the model is wrong, but because nothing around it was built carefully enough to make the model's mistakes recoverable.
The arc, in shorthand: started in Mumbai at MAQ Software, which gave me my first real exposure to engineering at scale. Moved to SalesHandy as a senior engineer and team lead, where I learned to ship product. Then RAx Labs, where I went deep into platform and built an eight-engineer team from scratch. After that I broadened back out as an independent contractor at Gartner, Blueland, and Mission Sustainability. BigCircle started in 2022 inside that mix, as a place to do AI engineering at the bar I wanted to hold myself to. Most of my work since has anchored there.
Right now my attention is mostly on agent architectures, eval-driven development, and the systems engineering of shipping LLM-powered products that hold up at scale. The work that pulls me in hardest is the kind where the AI is the hard part of the product, not a feature bolted on. Lately I've been spending most of my time on the unglamorous side: tooling, evals, fallbacks, infrastructure. The work that decides whether a model-powered feature is shippable or just impressive.
Off the clock, I'm still an engineer at heart. I love tinkering, and I obsess over dev tools, sharper workflows, and small side projects that are way overengineered for what they actually do. Most of them never ship. They keep the craft sharp anyway.
Frequently asked
- What do you actually build?
- Production LLM features, agent loops, MCP tool servers, retrieval pipelines, evals, and the product surface around them. The model is rarely the hard part; the system around it usually is.
- Where are you based, and who do you work with?
- Based in India and working globally. Most recent engagements have been with US-based AI startups and US-headquartered enterprises, with overlap windows for sync time.
- How is this different from a generalist software engineer?
- I came up as a generalist and still operate full-stack when needed. The focus today is the parts of LLM-powered products that most engineering teams undercount: tool design, context engineering, evals, fallbacks, latency and cost shaping, and the infra around model calls.
- What stack do you typically work in?
- Python and TypeScript on the model and product side. LangChain, LlamaIndex, MCP, pgvector, Pinecone, LoRA fine-tuning, Langfuse for evals, Next.js or Astro on the product surface, AWS or Cloudflare for the runtime. The stack changes with the problem; the discipline does not.
- How do you decide whether an AI feature is shippable?
- A feature is shippable when its evals are honest, its failure modes degrade gracefully, its cost and latency stay inside a budget at the traffic you actually expect, and the product around it is still usable when the model is wrong.
- What is the fastest way to figure out fit?
- Read the Work page for shipped systems, skim two posts under Writing, then reach out via LinkedIn or GitHub. A short async exchange almost always tells both sides whether the next call is worth booking.
When this is not the right fit
The honest version. Skip me if any of the following sound like your situation.
- You need a thin wrapper around an OpenAI call with no eval or production discipline. A lighter contractor will be cheaper and faster.
- The AI feature is a demo with no traffic plan. A lot of systems engineering is premature at that stage.