{"id":79124,"title":"nCompass Technologies - Low-latency deployment of AI models made easy","tagline":"nCompass is an API that requires only one-line-of-code to integrate low latency versions of open-source/custom models into your AI pipeline.","body":"**tl;dr** If unpredictable response times and rate limits of OpenAI are causing your tool’s user experience to suffer, [nCompass](https://ncompass.tech/) allows you to effortlessly tap into the world of open-source AI models while ensuring that the served models meet your target budget and performance requirements.\n\n—\n\nHey all, we are [Diederik](https://linkedin.com/in/diederik-vink) and [Aditya](https://linkedin.com/in/adityarajagopal), the co-founders of [nCompass](https://ncompass.tech/), a platform for simplified hosting and acceleration of open-source and custom LLMs.\n\n### **The Problem**\n\nLLM-based products that use closed-source model providers like OpenAI suffer from slow response times and rate limits. \n\nOpen-source models are a great alternative, but hosting a model yourself is a lot of extra work and maintenance which distracts you from your core business.\n\n### **Our solution**\n\nnCompass provides an API that allows you to integrate accelerated versions of any open-source or custom model of your choice into your AI pipeline. We support OpenAI style chat templates, work with all web frameworks, and have a time-based pricing model that results in a predictable compute cost for users.\n\n### **How it works**\n\nWe serve models to users with a simple 3-step process:\n\n1. Select your desired open-source / custom model\n2. Provide your performance requirements\n3. Set a budget you are not willing to exceed\n\nWe set up the deployment that meets these requirements and provide you with a single API Key that you can then use to integrate the model with a single line of code.\n\nWe support any model currently hosted on Hugging Face, with some highlights being: \n\n* **Mistral-7B :** 160ms Time-To-First-Token @ 86 tok/s\n* **Mixtral-8x7B :** 300ms Time-To-First-Token @ 64 tok/s\n\n### **Demo**\n\n\u003chttps://www.youtube.com/watch?v=sdHVji8QGOg\u003e \n\nAlso, check out our [GitHub](https://github.com/nCompass-tech/nCompass) repository for code examples.\n\n### **The team**\n\nSince we met in undergrad (9 years ago) through to our PhDs at Imperial College London, we’ve worked on every project together. Our PhDs focused on hardware acceleration of large-scale machine learning models covering all levels of the stack from algorithms and compilers down to digital hardware design.\n\n![uploaded image](/media/?type=post\u0026id=79124\u0026key=user_uploads/1641909/8f0f2b1e-8c86-4269-828e-a90176159dd6)\n\n### **Asks**\n\n* Book a [demo](https://calendar.app.google/eLCRjByYErh9X2hh9) \n* **Warm intros** to anyone you know who requires accelerated and/or hosted versions of open-source models.\n\nOur emails are [aditya.rajagopal@ncompass.tech](mailto:aditya.rajagopal@ncompass.tech) and [diederik.vink@ncompass.tech](mailto:diederik.vink@ncompass.tech)","slug":"KaC-ncompass-technologies-low-latency-deployment-of-ai-models-made-easy","created_at":"2024-03-06T18:41:41.718Z","updated_at":"2026-05-25T02:07:56.715Z","total_vote_count":97,"url":"https://www.ycombinator.com/launches/KaC-ncompass-technologies-low-latency-deployment-of-ai-models-made-easy","share_image_url":"//bookface-static.ycombinator.com/assets/ycdc/yc-og-image-c440a0ad1dacfb86eeeb343717479cc54d256614449b4ef719977a0a451f8bc8.png","company":{"id":29266,"name":"nCompass Technologies","slug":"ncompass-technologies","url":"https://www.ncompass.tech","logo":"https://bookface-images.s3.amazonaws.com/small_logos/ec22cfec2eb1602e4ae48a85862a05e4be149771.png","batch":"Winter 2024","industry":"B2B","tags":["Artificial Intelligence","Developer Tools","Hardware","Open Source"],"search_path":"https://bookface.ycombinator.com/company/29266"}}