📍 Local Job Near You

Senior LLM Inference Engineer — Performance & GPU Optimization

🏢

Confidential

📍 Singapore, Singapore

📍

Location Singapore

📅

Posted May 30, 2026

🚗

Commute Local Area

🎯

Local Opportunity Near You!

This job is in your area. Enjoy a short commute and work close to home.

📋
Job Description

Own the performance of large language models in production — the latency, the throughput, the cost-per-token. This is deep inference-optimization work: profiling and tuning at the GPU and serving-engine level to make models run faster and cheaper at scale. You'll join a small, senior team at an established enterprise software company building LLM-powered capabilities into its products.

What you'll do:
Optimize LLM inference for latency, throughput, and cost — at the kernel and serving-engine level
Profile and tune GPU performance (CUDA, TensorRT-LLM); apply quantization, speculative decoding, and batching strategies
Get the most out of serving frameworks like vLLM, SGLang, and Triton — and extend them where they fall short
Optimize across hardware targets where relevant (NVIDIA and other accelerators)
Partner with model and platform teams to take new architectures from works to fast

Apply for This Job

Submit Application

Quick and secure application process

📍 Location Details

🌆

City

Singapore

🗺️

Country

Singapore

🚗

Commute

Local Area

🔍 More Jobs Nearby

Explore other opportunities in Singapore

View Local Jobs

Senior LLM Inference Engineer — Performance & GPU Optimization

📋 Job Description

Apply for This Job

📍 Location Details

🔍 More Jobs Nearby

📋
Job Description