Location
India
Posted
June 10, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Data Scientist
Design and implement end-to-end evaluation frameworks to assess performance, reliability, and safety of multi-agent AI systems Lead experimentation and A/B testing efforts to systematically test hypotheses, validate model improvements, and track performance across agent iterations Curate and maintain high-quality ground truth datasets to enable accurate, reproducible evaluation of multi-agent outputs Identify and address reliability and accuracy gaps across agent workflows, failure modes, and edge cases in production-like environments Stay current on emerging research in agentic AI, LLM evaluation, and multi-agent coordination to continuously improve framework designTechnical Skills
Proficiency in Python and ML frameworks Hands-on experience with LLM APIs and agentic frameworks (Lang Chain, Llama Index, Semetic Kernal I) Familiarity with evaluation tooling (Ragas, Deep Eval, Lang Smith, or similar) Experience with data pipelines, experiment tracking (M...