Luminal

Making AI run fast on any hardware.

Cloud Inference Engineer

$150K - $350K0.50% - 2.00%San Francisco, CA, US
Job type
Full-time
Role
Engineering, Machine learning
Experience
Any (new grads ok)
Visa
US citizen/visa only
Skills
Torch/PyTorch, CUDA
Connect directly with founders of the best YC-funded startups.
Apply to role ›
Jake Stevens
Jake Stevens
Founder

About the role

Qualifications

  • CUDA + GPU inference optimization
  • vLLM, SGLang, or TensorRT-LLM experience
  • KV caching, paged attention, batching, token streaming, etc.
  • Distributed compute (with GPUs is a super plus)
  • No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

  • Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
  • Conducting model performance reviews
  • Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
  • Sometimes write kernels and, yes, occasional tasteful shitposting

About Luminal

Luminal
Founded:2025
Batch:S25
Team Size:5
Status:
Active
Location:San Francisco
Founders
Joe Fioti
Joe Fioti
Founder / CEO
Matthew Gunton
Matthew Gunton
Founder
Jake Stevens
Jake Stevens
Founder