Location
Colombia
Posted
June 25, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
This is a remote position.
Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage βEvals Specialistβ model with a standing owner for measurable agent quality.
Key Responsibilities
β’ Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
β’ Wire evals into CI so quality regressions fail builds and releases.
β’ Define and maintain release-gate thresholds with Product and the Tech Lead.
β’ Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.
Requirements
Must-Have Qualifications
β’ Experience evaluating ML, LLM, or non-deterministic systems.
β’ Strong tes...