CellType

The agentic drug company. We simulate human biology.

Founding Research Engineer, Model Training

$150K - $250K0.50% - 2.00%New York, NY, US
Job type
Full-time
Role
Engineering, Machine learning
Experience
3+ years
Visa
Will sponsor
Skills
Distributed Systems, Machine Learning, Reinforcement learning (RL)
Connect directly with founders of the best YC-funded startups.
Apply to role ›
Ivan Vrkic
Ivan Vrkic
Co-Founder

About the role

Founding Research Engineer, Model Training

Location: New York City
Type: Full-time

About CellType

CellType is building foundation models and agent systems for biology.

We believe the next major advances in biotech AI will come from models trained to reason over biological data, experiments, and translational outcomes, not from lightweight wrappers around generic models. We work with pharma and biotech partners on problems such as preclinical-to-clinical translation, response prediction, biomarker discovery, and scientific reasoning across complex biological datasets. Our core technology was originally developed at Yale in collaboration with Google DeepMind, and has been published at top ML venues including ICML.

We are building the core intelligence layer for biology.

About the role

We are hiring a Founding Research Engineer to build and scale the systems that improve our models.

This role sits at the boundary of research and engineering. You will work on training, post-training, evaluation, performance optimization, and the systems needed to support all of that. You should be excited by both novel model development and the operational reality of making training systems run reliably.

What you'll do

  • Build and improve training and post-training systems for biological foundation models and agentic model workflows
  • Design and run experiments across supervised fine-tuning, reinforcement learning, tool use, evaluation, and model behavior optimization
  • Build and maintain distributed RL and post-training infrastructure
  • Improve reliability of rollout, evaluation, and reward pipelines
  • Own critical parts of the model training stack, including performance, reliability, observability, and debugging
  • Investigate and resolve issues across the full stack, from training dynamics and evaluation infrastructure to distributed systems and hardware bottlenecks
  • Profile and eliminate performance bottlenecks across GPU, networking, and storage layers
  • Build clean abstractions for experiments, model evaluation, and distributed training workflows
  • Improve training efficiency, stability, and throughput
  • Work closely with founders and domain experts to translate biological problems into model tasks, environments, and evaluation frameworks
  • Help turn research improvements into real product and customer advantage

You may be a fit if you

  • Have hands-on experience training or materially improving serious LLM or generative ML systems
  • Have strong software engineering and distributed systems fundamentals
  • Have deep experience with Python and modern ML frameworks such as PyTorch, JAX, or equivalent systems
  • Have experience with reinforcement learning or post-training methods
  • Have built evaluation systems for tool-using or open-ended models
  • Have a deep understanding of GPU execution constraints and memory trade-offs
  • Have experience debugging performance issues in production ML systems
  • Can reason about system-level trade-offs between latency, throughput, and cost
  • Have a track record of owning critical production infrastructure
  • Can balance research exploration with engineering implementation
  • Have experience with distributed systems, large-scale training, or performance-sensitive ML workloads
  • Care about code quality, testing, performance, and maintainability
  • Are comfortable in a small team where priorities move toward whatever is most important
  • Communicate clearly and collaborate well under both normal and high-pressure conditions
  • Want broad ownership rather than a narrow role boundary

This role will directly shape the quality and speed of CellType's core model systems. The right person will help determine not only how good our models become, but how fast we can improve them and how confidently we can deploy them.

If you want to work on difficult model problems with real scientific and commercial consequences, we'd love to talk.

About CellType

CellType is building foundation models and agent systems for biology.

We are building models that learn from biological data, experiments, and outcomes directly. Our goal is to develop AI systems that are genuinely useful for scientific discovery and drug development.

We work on problems such as preclinical-to-clinical translation, response prediction, biomarker discovery, and reasoning across complex biological datasets. These are technically difficult problems with real scientific and commercial importance.

We are an early team with high standards for technical depth, speed, and ambition. This is a good fit for people who want to work on hard model and systems problems in a domain that matters.

CellType
Founded:2025
Batch:W26
Team Size:2
Status:
Active
Location:New York City, NY
Founders
David van Dijk
David van Dijk
Co-Founder & CEO
Ivan Vrkic
Ivan Vrkic
Co-Founder