Expanse: Unlock wasted GPU capacity.

Unlock wasted GPU capacity.

Expanse unlocks wasted GPU capacity. We recover idle compute through three capabilities: resource prediction (right-sizing job submissions before they reach the scheduler), optimisation suggestions (code and config changes researchers can apply themselves), and failure prediction (catching jobs that will fail before they consume hours of GPU time). We’re four engineers. We ran HPC and GPU training workloads at the largest quant funds and national supercomputing centres. We faced this problem first hand and the only fix was to over-provision and burn millions. Ismaeel built the first multimodal HPC resource predictor as research at EPCC (Edinburgh’s Parallel Computing Centre), which beat every published baseline. This is the tool we wish we had.

Active Founders

Ismaeel Bashir

Founder

Ismaeel is co-founder and CEO of Expanse. Built the first multimodal HPC resource predictor at EPCC (Edinburgh’s Parallel Computing Centre), beating every published baseline. Previously: ran large scale ML models at one of the world’s largest quantitative funds (QRT). Studied Computer Science at the University of Edinburgh.

Ismaeel Bashir

Founder

Ismaeel is co-founder and CEO of Expanse. Built the first multimodal HPC resource predictor at EPCC (Edinburgh’s Parallel Computing Centre), beating every published baseline. Previously: ran large scale ML models at one of the world’s largest quantitative funds (QRT). Studied Computer Science at the University of Edinburgh.

Nikodem Bieniek

Founder

Nikodem is co-founder and CTO of Expanse. Trained and optimised speech recognition models on GPU clusters. Previously: managed the platforms researchers and engineers depended on at one of the world’s largest hedge funds (Millennium). Studied Computer Science at the University of Edinburgh.

Nikodem Bieniek

Founder

Nikodem is co-founder and CTO of Expanse. Trained and optimised speech recognition models on GPU clusters. Previously: managed the platforms researchers and engineers depended on at one of the world’s largest hedge funds (Millennium). Studied Computer Science at the University of Edinburgh.

Yafet Melake

Founder

Yafet is co-founder and COO of Expanse. Built the first GNN-based cluster graph network for predicting SLURM queue wait times at EPCC (Edinburgh’s Parallel Computing Centre). Previously: tooling and infrastructure for researchers at one of the world’s largest quantitative funds (G-Research). Studied Computer Science at the University of Edinburgh.

Yafet Melake

Founder

Yafet is co-founder and COO of Expanse. Built the first GNN-based cluster graph network for predicting SLURM queue wait times at EPCC (Edinburgh’s Parallel Computing Centre). Previously: tooling and infrastructure for researchers at one of the world’s largest quantitative funds (G-Research). Studied Computer Science at the University of Edinburgh.

Eren Mendi

Founder

Eren is co-founder and CPO of Expanse. Built state-of-the-art decentralised foundation model training systems and performance models. Previously: prototyped emerging technologies in quantitative finance (G-Research). Studied Computer Science at the University of Edinburgh and Imperial College London.

Eren Mendi

Founder

Eren is co-founder and CPO of Expanse. Built state-of-the-art decentralised foundation model training systems and performance models. Previously: prototyped emerging technologies in quantitative finance (G-Research). Studied Computer Science at the University of Edinburgh and Imperial College London.

Latest News

What Y Combinator's Latest Batch Reveals About The Future

Jun 08, 2026

Company Launches

Expanse - Unlock wasted GPU capacity

See original launch post

TL;DR

Expanse unlocks wasted GPU capacity. We’re the intelligence layer for compute infrastructure. Submit jobs with the right resources. Optimise them to run faster. Debug failures in seconds, not days. Works on cloud and on-prem HPC.

Want to know how much more you can get out of your hardware? Schedule a call: https://book.getexpanse.io

Problem

Running compute on a cluster means guessing memory, time, and GPUs before you submit. Get it wrong and your job fails mid-run. Over-guess and you pay for compute you never use. 30% of cloud spend is lost to over-allocation [1]. Microsoft Research found their own deep learning jobs hit 50% GPU utilisation or lower [2].

Sizing jobs takes decades of experience. Memory patterns, batching, GPU threading, CPU vectorisation. AI agents don’t have enough information to fix it either. SLURM and Kubernetes don’t expose enough structured data. Even strong platform teams are forced to choose: spend time debugging and optimising every job, or accept the waste. AI infrastructure spending will hit $2.52 trillion in 2026 [3]. The waste grows with every new cluster. This problem isn’t going away.

Solution

Expanse learns your cluster. We analyse your code and hardware telemetry, then track them over time. Our deep learning models build a picture of what your workloads actually need. Predictions get sharper with every job.

Our models are deployed on your cluster. All analysis runs on your network. Your code and telemetry never leave your infrastructure. We can’t see your data, and you maintain full data sovereignty.

One command installs Expanse on your cluster. You get two tools.

expanse analyse - for every job:

Suggests the right resource configuration
Predicts failures from historical code and telemetry patterns
Surfaces code-level optimisations that increase efficiency

expanse diagnose - for every failure:

Finds the root cause
Returns solution-oriented logs with the exact code and config changes to fix it

Days of debugging and optimising become seconds. Platform teams get what they need to do this in minutes instead of weeks.

Founders

We're four engineers from the University of Edinburgh, all coming from top quant funds and national supercomputing centres. Ismaeel built the first multimodal HPC resource predictor at EPCC (Edinburgh's Parallel Computing Centre), beating every published baseline. Yafet built the first GNN cluster graph network for predicting SLURM queue wait times. Eren built decentralised foundation model training systems at Edinburgh. Niko trained large-scale speech ML models on GPU clusters and architected cloud and Kubernetes infrastructure at one of the largest hedge funds in the world. We left our jobs to fix this ourselves.

CTA / Ask

Running compute and not getting enough out of what you’re paying for? Failing jobs, unclear errors, bills too high? We’d love to chat.

Know anyone in AI infrastructure (managing cloud and/or on-premise data centres), quant finance, drug discovery, or AI research labs? An intro would mean a lot :)

YC Deal

All YC companies running serious ML training can contact us for a discounted pilot. We’ll find inefficiencies and wastage on your compute setup. Then we deploy Expanse, no changes to your workflows. You get more out of the hardware you’re paying for.

Reach out: founders@expanse.sh or https://book.getexpanse.io

References

[1] https://grafana.com/blog/2023/03/03/how-to-optimize-resource-utilization-with-kubernetes-monitoring-for-grafana-cloud

[2] https://www.microsoft.com/en-us/research/publication/an-empirical-study-on-low-gpu-utilization-of-deep-learning-jobs/

[3] https://www.gartner.com/en/newsroom/press-releases/2026-1-15-gartner-says-worldwide-ai-spending-will-total-2-point-5-trillion-dollars-in-2026