📍 Jobs Near Me
📍

HiringNearMe.work

Local Jobs, Zero Commute

📍 Local Job Near You

Senior LLM Inference Engineer — Performance & GPU Optimization

🏢
Confidential
📍 Singapore, Singapore
📍
Location Singapore
📅
Posted May 30, 2026
🚗
Commute Local Area
🎯
Local Opportunity Near You!

This job is in your area. Enjoy a short commute and work close to home.

📋
Job Description

Own the performance of large language models in production — the latency, the throughput, the cost-per-token. This is deep inference-optimization work: profiling and tuning at the GPU and serving-engine level to make models run faster and cheaper at scale. You'll join a small, senior team at an established enterprise software company building LLM-powered capabilities into its products.


What you'll do:

  • Optimize LLM inference for latency, throughput, and cost — at the kernel and serving-engine level
  • Profile and tune GPU performance (CUDA, TensorRT-LLM); apply quantization, speculative decoding, and batching strategies
  • Get the most out of serving frameworks like vLLM, SGLang, and Triton — and extend them where they fall short
  • Optimize across hardware targets where relevant (NVIDIA and other accelerators)
  • Partner with model and platform teams to take new architectures from works to fast

Apply for This Job

Submit Application

Quick and secure application process

📍 Location Details

🌆
City
Singapore
🗺️
Country
Singapore
🚗
Commute
Local Area

🔍 More Jobs Nearby

Explore other opportunities in Singapore

View Local Jobs