πŸ“ Jobs Near Me
πŸ“

HiringNearMe.work

Local Jobs, Zero Commute

πŸ“ Local Job Near You

AI Inference Performance Engineer

🏒
NVIDIA
πŸ“ Santa Clara, United States
πŸ“
Location Santa Clara
πŸ“…
Posted June 03, 2026
πŸš—
Commute Local Area
🎯
Local Opportunity Near You!

This job is in your area. Enjoy a short commute and work close to home.

πŸ“‹
Job Description

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability.


What You Will Be Doing:
+ Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM.
+ Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion mode...

Apply for This Job

Submit Application

Quick and secure application process

πŸ“ Location Details

πŸŒ†
City
Santa Clara
πŸ—ΊοΈ
Country
United States
πŸš—
Commute
Local Area

πŸ” More Jobs Nearby

Explore other opportunities in Santa Clara

View Local Jobs