I spent my first year in tech as an SDET — writing test cases, running regression suites, filing bug reports. Good work. Predictable work.
Then AI started showing up everywhere. Features built on LLMs. Agents making decisions. RAG systems answering questions. All of it shipping to production with almost no real evaluation strategy.
That gap bothered me.
So I started learning. Not ML research — the engineering side. How do you test a system that doesn't return the same answer twice? How do you measure quality when there's no ground truth? How do you catch regressions in a probabilistic system?
I built my first eval framework. Then another. Then I started writing about it.
Now I do this full time at Mercedes-Benz Research & Development India in Bangalore — building evaluation and testing infrastructure for LLM systems from scratch.
This site is where I think out loud. Not polished takes. Working notes from someone in the middle of figuring it out.
Outside work — Royal Enfield Bullet 350, roads that don't show up on Google Maps, and finding the quietest Shiva temples in Karnataka.