Staff / Principal Machine Learning Engineer, Serving - Switzerland
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Who We're Looking For
A year ago, reliably working agentic systems and sub‑second multimodal inference at scale barely existed. Nobody has a decade of experience here. So we're not screening for a resume template — we're looking for strong people from varied backgrounds who learn fast, thrive in ambiguity, and can show us what they've built, broken, and understood.
Experience We Find Useful
Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT‑LLM.
Model Acceleration . Hands‑on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
High‑Performance Systems. Proficiency in C++, CUDA, Rust, or highly optimised Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs.
Distributed Systems & Scaling. Experi...