{"id":90592,"title":"Theorem – AI coding that is trustworthy-by-default","tagline":"Program verification for superhumanly complex code","body":"### **TL;DR:** **Please give us your most complicated coding problem that _needs_ to be correct. [Sign up!](https://theoremlabs.com/waitlist/)**\n\nHello World! We’re [Jason](https://jasongross.github.io/) and [Rajashree](https://scholar.google.com/citations?user=41ZE3NwAAAAJ\u0026hl=en\u0026oi=ao)! We think AI code generation is going to be bigger than any of us can imagine.\n\n### **MISSION**\n\nTheorem is an AI and programming languages research lab. Our question: as the models get better at generating code, what is the bottleneck to safely deploying vastly more computation in the world? \n\n* _Code has bugs._  AI-enabled attackers will increase the volume and velocity of attacks on software infrastructure. Human code review will not scale.\n* _AIs are untrusted generators._ In order to deploy AIs in critical systems without fully understanding their generalization behavior, we’ll need robust, scalable methods for overseeing their solutions.\n\n### **OPPORTUNITY**\n\nProgramming languages and formal methods is the promise that we can bootstrap roots of trust. Even as complexity of software goes up, we can design simple interfaces for steering with leverage.\n\nPreviously, this technology was only applied to the most mission-critical infrastructure, because of the enormous labor bottleneck — requiring years of PhD level engineering work. For instance, Jason worked on the project that wrote the code generator powering trillions of secure https connections everyday, but it took fifteen man-years to complete!\n\nWith advances in AI, we’re able to speed this development cycle 10,000 times! As a [wild example](https://www.youtube.com/watch?v=O9nKMjjb3Hg) of what’s possible, we spent two hours finding what would have been a zero-day in https code from 2011.\n\n### **OUR APPROACH**\n\nWe want to build workflows for trustworthy-by-default programming by pushing our best ideas till they break. The current candidate to beat is _program equivalence_ driven development. You oversee a simple reference implementation. We give you an optimized implementation, along with a proof that the programs are functionally equivalent. \n\n![uploaded image](/media/?type=post\u0026id=90592\u0026key=user_uploads/2206829/f5c671df-ac60-4b94-8081-36ae8c0aeda4)\n\nWe’re gradually rolling out our programming environment and models. Some use cases from our customers:\n\n* Performance optimization: finding edge cases where hyper-optimized GPU code would fail; getting clear documentation of dependencies that modify behavior in unforeseen ways. \n* Code migration: automating legacy code refactors from Python to Rust while proving no functional change; writing custom kernels to support new services.\n\n**If you work in public infrastructure, finance, hardware, or have other use cases that demand correctness, we’d love to help. Please reach out at [founders@theoremlabs.com](mailto:founders@theoremlabs.com)**","slug":"NZA-theorem-ai-coding-that-is-trustworthy-by-default","created_at":"2025-05-21T14:58:50.047Z","updated_at":"2026-07-22T04:40:14.817Z","total_vote_count":82,"url":"https://www.ycombinator.com/launches/NZA-theorem-ai-coding-that-is-trustworthy-by-default","share_image_url":"https://www.ycombinator.com/media/?type=post\u0026id=90592\u0026key=user_uploads/2206829/f5c671df-ac60-4b94-8081-36ae8c0aeda4","company":{"id":30548,"name":"Theorem","slug":"theorem-2","url":"https://theorem.dev","logo":"https://bookface-images.s3.amazonaws.com/small_logos/4462cecf7e86ccbeaa67750a89b934e1dcf2fa9f.png","batch":"Spring 2025","industry":"B2B","tags":["Machine Learning"],"search_path":"https://bookface.ycombinator.com/company/30548"}}