Luminal

Making AI run fast on any hardware.

Cloud Inference Engineer

$150K - $250K•0.15% - 0.75%•San Francisco, CA, US

Job type

Full-time

Role

Engineering, Machine learning

Experience

Any (new grads ok)

Visa

US citizen/visa only

Skills

Torch/PyTorch, CUDA

Connect directly with founders of the best YC-funded startups.

Apply to role ›

Jake Stevens

Jake Stevens

Founder

Jake Stevens

Jake Stevens

Founder

About the role

Qualifications

CUDA + GPU inference optimization
vLLM, SGLang, or TensorRT-LLM experience
KV caching, paged attention, batching, token streaming, etc.
Distributed compute (with GPUs is a super plus)
No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day to day responsibilities:

Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
Conducting model performance reviews
Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
Sometimes write kernels and, yes, occasional tasteful shitposting

About Luminal

Luminal

Founded:2025

Batch:S25

Team Size:5

Status:

Active

Location:San Francisco

Founders

Joe Fioti

Joe Fioti

Founder / CEO

Joe Fioti

Joe Fioti

Founder / CEO

Matthew Gunton

Matthew Gunton

Founder

Matthew Gunton

Matthew Gunton

Founder

Jake Stevens

Jake Stevens

Founder

Jake Stevens

Jake Stevens

Founder