Location
sevilla
Posted
May 31, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Job Title:
¿Tiene lo que se necesita para triunfar? La siguiente información debe ser leída atentamente por todos los candidatos.
LLM Evaluator (Model Response Analyst)
Location:
Remote (Worldwide)
Job Summary:
We are seeking a detail-oriented and analytical LLM Evaluator to assess, analyze, and improve the performance of large language models (LLMs). In this role, you will evaluate AI-generated content for accuracy, coherence, factual reliability, bias, safety, and alignment with defined guidelines.
Responsibilities
Evaluate and rank model-generated text based on complex rubrics covering dimensions such as factuality, coherence, safety, instruction‑following, and creativity.
Review multiple model responses to the same prompt and determine which output a human would prefer, providing justifications for your choices.
Provide clear, concise feedback to the modeling and training teams regarding recurring failure models observed during eval...
¿Tiene lo que se necesita para triunfar? La siguiente información debe ser leída atentamente por todos los candidatos.
LLM Evaluator (Model Response Analyst)
Location:
Remote (Worldwide)
Job Summary:
We are seeking a detail-oriented and analytical LLM Evaluator to assess, analyze, and improve the performance of large language models (LLMs). In this role, you will evaluate AI-generated content for accuracy, coherence, factual reliability, bias, safety, and alignment with defined guidelines.
Responsibilities
Evaluate and rank model-generated text based on complex rubrics covering dimensions such as factuality, coherence, safety, instruction‑following, and creativity.
Review multiple model responses to the same prompt and determine which output a human would prefer, providing justifications for your choices.
Provide clear, concise feedback to the modeling and training teams regarding recurring failure models observed during eval...