{"id":73681,"title":"Cedana - Real-time save, move and resume for compute","tagline":"Solving critical GPU performance, reliability, capacity and cost problems","body":"Hey, we’re Neel and Niranjan from [Cedana](https://cedana.ai/?utm_source=cedana\u0026utm_medium=yc\u0026utm_campaign=launch).\n\n![uploaded image](/media/?type=post\u0026id=73681\u0026key=user_uploads/77597/b1097aa9-66f2-4253-8243-63e63d5c23e9)\n\n### **The Problem**\n\nLosing work because of infra problems is painful. Imagine you have a long-running compute job and the instance fails. Your 20-hour job finished but because your pipeline was misconfigured you have to restart it from scratch.  \n\nBurning cash is stressful.  Poor utilization results in your inference jobs costing more.  If you’re managing a cluster of 1,000s of GPUs, poor utilization leaves money on the table even while demand is skyrocketing.    \n\nCold start times impact your customer satisfaction and their reliance on your solution.  \n\nLimited GPU access makes it difficult for you to innovate, and finding GPUs can be a full-time job of constantly identifying, evaluating, and adapting different vendors.    \n\n### **Our Solution**\n\nCedana is real-time migration for compute.  We automatically schedule and move workloads across instances and vendors without interrupting progress or breaking anything.\n\nThere are several critical benefits:\n\n* Utilization is maximized to save costs and eliminate idle resources.  \n* Job-level SLAs dynamically allocate compute between inference and training, prioritizing costs, latency, and performance based on preferences.  \n* Avoid re-running jobs from scratch due to infra, pipeline, or memory failures. Jobs continue their progress in the event of infra failure or spot revocation.\n* Access planet-scale compute through vendor aggregation\n* Solve the cold start problem by enabling fast auto-suspend-resume \n* Spot management that migrates and provisions new instances automatically upon revocation or failure.\n\n  \n\nOpenAI, Meta, Microsoft, and Databricks employ some of these methods internally and we’re bringing them to everyone.\n\n### **How it works**\n\n[Cedana](https://cedana.ai/?utm_source=cedana\u0026utm_medium=yc\u0026utm_campaign=launch) is available as an [open-source package](https://github.com/cedana/cedana-cli) and as a [managed service](https://www.ycombinator.com/launches/JAP-cedana-real-time-save-move-and-resume-for-compute). \n\n* Cedana needs no code modification and works with Linux processes or containers\n* Current use cases and customers range from AI Training and Inference, High-Performance Compute, DevTools, ML Ops platforms, and Computational Biology.\n* It automatically provisions and manages infra with your existing credentials. Our managed service can leverage our vendor relationships if preferred.\n\nHere’s a 1m30s [demo video](https://www.youtube.com/watch?v=KC4STzSQ_DU)\n\n### **Our Team**\n\nOur team has built real-world robotics and large-scale computer vision systems across a number of places including 6 River Systems/Shopify and MIT.  We’ve led the development, commercialization, and scale of NLP for clinical workflows used in the delivery of patient care.   Our team’s publications span computer vision, computer graphics, robotics optimization, and spacecraft/aerospace controls, with patents in AI use cases for grid energy management, optimal battery control, and healthcare. \n\n### **We kindly ask**\n\n* Give Cedana a try! [Check out our Github Repo](https://github.com/cedana/cedana-cli)\n* Please support us with a Github star","slug":"JAP-cedana-real-time-save-move-and-resume-for-compute","created_at":"2023-08-01T22:03:04.520Z","updated_at":"2026-05-25T01:25:59.650Z","total_vote_count":26,"url":"https://www.ycombinator.com/launches/JAP-cedana-real-time-save-move-and-resume-for-compute","share_image_url":"//bookface-static.ycombinator.com/assets/ycdc/yc-og-image-c440a0ad1dacfb86eeeb343717479cc54d256614449b4ef719977a0a451f8bc8.png","company":{"id":28812,"name":"Cedana","slug":"cedana","url":"https://cedana.ai","logo":"https://bookface-images.s3.amazonaws.com/small_logos/765d8a1a78a1c5939f354a26be87d3c1c1696290.png","batch":"Summer 2023","industry":"B2B","tags":["Deep Learning","Developer Tools","Cloud Computing","Infrastructure","AI"],"search_path":"https://bookface.ycombinator.com/company/28812"}}