Morph builds the fastest LLM code-editing inference engine in the world. We hit 10,500 tok/sec per request on NVIDIA hardware.
Our stack powers high-throughput AI workflows for vibe coding apps, devtools, PR bots, and IDEs.
We’re hiring a founding ML Researcher to push the limits of model capability, throughput, and reliability across inference, retrieval, and edit application. This is a research role that ships. If your work cannot survive contact with production, it does not count here.
We’re looking for someone with broad, T-shaped spikey experience across research, systems, and product, plus a deep spike in modern LLM training and inference. You bring taste and judgment. AI can accelerate execution. It cannot replace those.
What You’ll Do
- Design and run experiments for LLMs specialized for code workflows: retrieval, search, editing, and tool use
- Train and fine-tune models (SFT + preference / RL variants), build evals, and close the loop until results are real
- Turn new research into production: model packaging, serving constraints, latency budgets, failure modes, monitoring
- Work directly on inference performance when it matters: KV cache strategy, batching, quantization, speculative decoding, kernel level bottlenecks
- Collaborate on data strategy: high signal datasets, preference data formats, automatic labeling, and rigorous evaluation
You’re a Fit If You
- PhD level or equivalent experience with PyTorch (plus TF or JAX is fine)
- Can implement papers without cargo culting them, and can explain why they work. Understands how to distiguish between papers that are noise and real
- Have shipped ML systems that run under real constraints: latency, cost, reliability, observability
- Understand modern LLM training mechanics and tradeoffs (data, objectives, RL, evals, inference)
- Prefer ownership and agency over committees and process theater
Bonus Points
- Experience with CUDA, kernels, Triton, TensorRT-LLM, vLLM, or custom inference stacks
- Experience with retrieval systems (embeddings, reranking, indexing) and evaluation methodology
- You have strong opinions about what matters in ML, and can defend them with evidence
Why Morph
- Zero fluff. Work directly with the founder. Everyone on the team is an ML engineer
- No busywork. If it doesn’t move the needle, we don’t do it
- Work on the fastest coding subagents in the world, and the research that makes it faster and smarter
Apply
- Describe the ML project you’re most proud of. Go deep on modeling choices, training setup, data, evals, failure cases, and what you’d do differently - the founder reviews every application personally and is a former ML engineer
- Describe what you’re deeply obsessed with (anything). We care about intensity and taste