{"id":100703,"title":"Expanse - Unlock wasted GPU capacity","tagline":"Unlock wasted GPU capacity.","body":"**TL;DR**\n\nExpanse unlocks wasted GPU capacity. We’re the intelligence layer for compute infrastructure. Submit jobs with the right resources. Optimise them to run faster. Debug failures in seconds, not days. Works on cloud and on-prem HPC.\n\nWant to know how much more you can get out of your hardware? Schedule a call: \u003chttps://book.getexpanse.io\u003e\n\n**Problem**\n\nRunning compute on a cluster means guessing memory, time, and GPUs before you submit. Get it wrong and your job fails mid-run. Over-guess and you pay for compute you never use. 30% of cloud spend is lost to over-allocation \\[1\\]. Microsoft Research found their own deep learning jobs hit 50% GPU utilisation or lower \\[2\\].\n\nSizing jobs takes decades of experience. Memory patterns, batching, GPU threading, CPU vectorisation. AI agents don’t have enough information to fix it either. SLURM and Kubernetes don’t expose enough structured data. Even strong platform teams are forced to choose: spend time debugging and optimising every job, or accept the waste. AI infrastructure spending will hit $2.52 trillion in 2026 \\[3\\]. The waste grows with every new cluster. This problem isn’t going away.\n\n**Solution**\n\nExpanse learns your cluster. We analyse your code and hardware telemetry, then track them over time. Our deep learning models build a picture of what your workloads actually need. Predictions get sharper with every job.\n\nOur models are deployed on your cluster. All analysis runs on your network. Your code and telemetry never leave your infrastructure. We can’t see your data, and you maintain full data sovereignty.\n\nOne command installs Expanse on your cluster. You get two tools.\n\n`expanse analyse` - for every job:\n\n* **Suggests the right resource configuration**\n* **Predicts failures from historical code and telemetry patterns**\n* **Surfaces code-level optimisations that increase efficiency**\n\n`expanse diagnose` - for every failure:\n\n* **Finds the root cause**\n* **Returns solution-oriented logs with the exact code and config changes to fix it**\n\nDays of debugging and optimising become seconds. Platform teams get what they need to do this in minutes instead of weeks.\n\n**Founders**\n\nWe're four engineers from the University of Edinburgh, all coming from top quant funds and national supercomputing centres. Ismaeel built the first multimodal HPC resource predictor at EPCC (Edinburgh's Parallel Computing Centre), beating every published baseline. Yafet built the first GNN cluster graph network for predicting SLURM queue wait times. Eren built decentralised foundation model training systems at Edinburgh. Niko trained large-scale speech ML models on GPU clusters and architected cloud and Kubernetes infrastructure at one of the largest hedge funds in the world. We left our jobs to fix this ourselves.\n\n**CTA / Ask**\n\nRunning compute and not getting enough out of what you’re paying for? Failing jobs, unclear errors, bills too high? We’d love to chat.\n\nKnow anyone in AI infrastructure (managing cloud and/or on-premise data centres), quant finance, drug discovery, or AI research labs? An intro would mean a lot :)\n\n**YC Deal**\n\nAll YC companies running serious ML training can contact us for a discounted pilot. We’ll find inefficiencies and wastage on your compute setup. Then we deploy Expanse, no changes to your workflows. You get more out of the hardware you’re paying for.\n\nReach out: [founders@expanse.sh](mailto:founders@expanse.sh) or \u003chttps://book.getexpanse.io\u003e\n\nReferences\n\n\\[1\\] \u003chttps://grafana.com/blog/2023/03/03/how-to-optimize-resource-utilization-with-kubernetes-monitoring-for-grafana-cloud\u003e\n\n\\[2\\] \u003chttps://www.microsoft.com/en-us/research/publication/an-empirical-study-on-low-gpu-utilization-of-deep-learning-jobs/\u003e\n\n\\[3\\] \u003chttps://www.gartner.com/en/newsroom/press-releases/2026-1-15-gartner-says-worldwide-ai-spending-will-total-2-point-5-trillion-dollars-in-2026\u003e","slug":"QCF-expanse-unlock-wasted-gpu-capacity","created_at":"2026-05-04T04:30:18.506Z","updated_at":"2026-07-22T13:15:32.609Z","total_vote_count":52,"url":"https://www.ycombinator.com/launches/QCF-expanse-unlock-wasted-gpu-capacity","share_image_url":"//bookface-static.ycombinator.com/assets/ycdc/yc-og-image-c440a0ad1dacfb86eeeb343717479cc54d256614449b4ef719977a0a451f8bc8.png","company":{"id":31516,"name":"Expanse","slug":"expanse","url":"https://expanse.sh","logo":"https://bookface-images.s3.amazonaws.com/small_logos/26ca5cac6157441dd12d0352f7d3fd4a77c4c39a.png","batch":"Spring 2026","industry":"B2B","tags":["Deep Learning","B2B","Infrastructure","ML"],"search_path":"https://bookface.ycombinator.com/company/31516"}}