π Local Job Near You
Working Student (m/f/d) LLM Agent Evaluation & Benchmarking
Agile Robots SE
π
Munich, Germany
Location
Munich
Posted
June 03, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
About the role
We are looking for a Working Student (m/f/d) LLM Agent Evaluation & Benchmarking. In this role, you will design and build an agent-agnostic benchmarking harness, run comparative evaluations across frontier and local models, and translate findings into prompt, guard, and tool-schema improvements.
Your Responsibilities
- Harness Development: Design and build an agent-agnostic benchmarking harness that executes versioned task suites against frontier and local models with reproducible, version-controlled runs.
- Task Suite Design: Define and maintain evaluation task suites that measure task success, grounding accuracy, latency, and cost across the agent portfolio.
- Model Evaluation: Run period...