{"id":84604,"title":"nCompass Technologies: Reliable LLM API with no rate-limits","tagline":"The most cost-effective way to setup and scale up your AI infrastructure.","body":"# **Tl;dr:**\n\nWe’ve built an AI model inferencing system that can serve requests at scale like no other and now we’re releasing it to the public as a **rate-limit-free API**. We serve any open-source LLM and can also deploy optimized versions of your custom fine-tuned LLM with cost-effective autoscaling. [Sign up here](https://console.ncompass.tech/login), create an API key, get $100 of credit on us, and run as many requests as you like!\n\n# **The Problem**\n\nDeploying AI models in production requires expensive infrastructure. Serving more than \\~10req/s using open source inference engines like vLLM on a single GPU results in terrible quality of service. Time-to-first-token skyrockets to more than 10s, and end-to-end latency degrades even more!\n\n**The common solution**: horizontally scale up GPUs.\n\n**The problem**: GPU’s are expensive and hard to find.\n\n# **Why should you care**\n\n1. **API user:** These high infrastructure costs are the reason you suffer rate limits when using existing API providers.\n2. **Deploying on-prem:** Your infrastructure costs might be the reason a PoC doesn’t move to production.\n\n# **Our Solution**\n\nWe’ve built an AI inference serving system that can sustain 100s of requests per second while maintaining a time-to-first-token of \u003c1s on \\~30% fewer GPUs when compared to NVIDIA’s NIMs containers and up to **2x fewer GPUs** when compared to vLLM.\n\nThis enables us to provide a rate-limit-free API while maintaining a high quality of service. Alternatively, we can provide this as a cost-effective on-prem deployment solution, ensuring your infrastructure costs don’t blow up with requests served. We support any open source model and can host your custom fine-tuned model as an API with autoscaling enabled as well.\n\n# **Tutorials**\n\n* [**Sign Up**](https://www.loom.com/share/bac5c44a8e694b5783f3ba7c777ce40a?sid=6226613a-2cd6-45e6-bfc3-ac0381198b2a)\n* [**Models and Pricing**](https://www.loom.com/share/f7dc12aefe304399a3975feef7e38938?sid=34c2a174-4050-40bc-b65a-844d3002e48c)\n* [**Run Inference**](https://www.loom.com/share/0933dc3fa2ba442fa789286ef75e09b8?sid=74266aa3-f24e-46d7-bcce-1f3c61d7d82e)\n\n# **Shout out**\n\nTo be able to build such a scalable and available system, we needed a top-quality hardware provider. We wanted to use this as an opportunity to shout out [Ori Global Cloud](https://www.ori.co/), a key partner in this journey, to enable a serverless Kubernetes platform for AI inference at scale. [Ori Serverless Kubernetes](https://www.ori.co/serverless-kubernetes) is an infrastructure service that combines powerful scalability, simple management, and affordability to help AI-focused startups realize their wildest AI ambitions. [Reach out to Ori](https://www.ori.co/gpu-request) for exclusive GPU cloud deals!\n\n# **Asks**\n\n* Use our self-serve console (\u003chttps://console.ncompass.tech/login\u003e) to create an account and start running with $100 of credit.\n* Book a demo (\u003chttps://calendar.app.google/3jRDwcstFQvsbqnR8\u003e) if you would like to discuss an on-prem solution. YC deals apply!\n\nOur pricing is transparent and can be found here: \u003chttps://console.ncompass.tech/public-pricing\u003e","slug":"M0a-ncompass-technologies-reliable-llm-api-with-no-rate-limits","created_at":"2024-10-09T14:26:51.395Z","updated_at":"2026-05-25T00:32:28.232Z","total_vote_count":37,"url":"https://www.ycombinator.com/launches/M0a-ncompass-technologies-reliable-llm-api-with-no-rate-limits","share_image_url":"//bookface-static.ycombinator.com/assets/ycdc/yc-og-image-c440a0ad1dacfb86eeeb343717479cc54d256614449b4ef719977a0a451f8bc8.png","company":{"id":29266,"name":"nCompass Technologies","slug":"ncompass-technologies","url":"https://www.ncompass.tech","logo":"https://bookface-images.s3.amazonaws.com/small_logos/ec22cfec2eb1602e4ae48a85862a05e4be149771.png","batch":"Winter 2024","industry":"B2B","tags":["Artificial Intelligence","Developer Tools","Hardware","Open Source"],"search_path":"https://bookface.ycombinator.com/company/29266"}}