📍 Local Job Near You
Senior MLOps Engineer - (ML/LLM) -Visas Supported
European Tech Recruit
📍
donostia / san sebastian, Spain
Location
donostia / san sebastian
Posted
June 07, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Senior MLOps Engineer
We are seeking a
Senior MLOps Engineer
to steer the technical vision of our Training and Inference Optimization team. In this high-impact role, you will architect the infrastructure that powers our next-generation AI models. You will bridge the gap between systems programming and machine learning, optimizing large-scale LLM training via NVIDIA NeMo and building ultra high-throughput serving systems using vLLM, TensorRT-LLM, and SGLang.
Your mission is to ensure our models are not only state-of-the-art but also production hardened, cost-efficient, and performant at scale.
Key Responsibilities
•
Training Infrastructure:
Architect and maintain scalable distributed training pipelines using NVIDIA NeMo/Nemotron/Megatron-Bridge. You will optimize GPU utilization, manage complex checkpointing strategies, and implement automated fault tolerance for long-running jobs. •
Inference Orchestration...
We are seeking a
Senior MLOps Engineer
to steer the technical vision of our Training and Inference Optimization team. In this high-impact role, you will architect the infrastructure that powers our next-generation AI models. You will bridge the gap between systems programming and machine learning, optimizing large-scale LLM training via NVIDIA NeMo and building ultra high-throughput serving systems using vLLM, TensorRT-LLM, and SGLang.
Your mission is to ensure our models are not only state-of-the-art but also production hardened, cost-efficient, and performant at scale.
Key Responsibilities
•
Training Infrastructure:
Architect and maintain scalable distributed training pipelines using NVIDIA NeMo/Nemotron/Megatron-Bridge. You will optimize GPU utilization, manage complex checkpointing strategies, and implement automated fault tolerance for long-running jobs. •
Inference Orchestration...