{"id":79074,"title":"Ragas: Open-source evaluation and testing Infrastructure for LLM applications","tagline":"Deploy your LLM application with confidence. ","body":"### **TL;DR**\n\nWe are building [Ragas](https://www.ragas.io/) — an open-source evaluation and testing infrastructure for LLM application developers to deploy their applications in production with confidence.\n\n### About us\n\n![uploaded image](/media/?type=post\u0026id=79074\u0026key=user_uploads/889503/21a941e3-4798-4628-95cc-15332facb47e)\n\nWe’re [Jithin](https://linkedin.com/in/jjmachan) and [Shahul](https://linkedin.com/in/shahules)! Having met in college, we’ve collaborated on various projects for almost a decade now.\n\nJithin takes care of building the software and infrastructure. He was an early employee at Bento ML, where he built and maintained tools like [Bentoctl](https://github.com/bentoml/bentoctl), [Bentoml](https://github.com/bentoml/BentoML), and [Yatai](https://github.com/bentoml/Yatai). Shahul is responsible for AI research and engineering. He is a [Kaggle Grandmaster](https://www.kaggle.com/shahules) and a lead contributor to different open-source AI projects, including [Open-Assistant AI.](https://github.com/laion-ai/open-assistant)\n\n### Problem\n\nBefore 2023 software used to be written in code but with the emergence of foundational models software and applications are going to be compound systems containing code, prompts, and other components. This introduces several new problems\n\n* How do you select the best model or component suitable for your application from the abundance of available resources?\n* How do you test these systems and ensure continuous quality?\n* How do you derive insights from production to measure and improve your system?\n\nAs early adopters of this technology to build applications, we faced this problem while we were building RAG systems early last year.\n\n### Solution\n\nWe at [Ragas](https://docs.ragas.io/) make use of model-graded evaluations and testing techniques to ensure quality. This includes automated synthesis of test data points, explainable metrics, and adversarial testing.\n\nWe started by building this for RAGs, which is the most popular application of LLM as of today. Ragas is now the default open-source standard for evaluating RAG applications, processing over 4.7 million responses last month and used by engineers from enterprises like AWS, Microsoft, Databricks, Moody’s, UHG, and Tencent.\n\n![uploaded image](/media/?type=post\u0026id=79074\u0026key=user_uploads/889503/4da2d427-8bfe-4e4b-a76e-01c7a28ce747)\n\n### Our Ask\n\n* Checkout ragas on [GitHub](https://github.com/explodinggradients/ragas)\n* If you’re building RAG applications, consider [applying](https://forms.gle/TnqUHuY176m2juqL9) for the Ragas office hours program","slug":"KZO-ragas-open-source-evaluation-and-testing-infrastructure-for-llm-applications","created_at":"2024-03-05T18:38:36.494Z","updated_at":"2026-05-25T05:00:31.785Z","total_vote_count":94,"url":"https://www.ycombinator.com/launches/KZO-ragas-open-source-evaluation-and-testing-infrastructure-for-llm-applications","share_image_url":"//bookface-static.ycombinator.com/assets/ycdc/yc-og-image-c440a0ad1dacfb86eeeb343717479cc54d256614449b4ef719977a0a451f8bc8.png","company":{"id":29347,"name":"Vibrant Labs","slug":"vibrant-labs","url":"https://vibrantlabs.com/","logo":"https://bookface-images.s3.amazonaws.com/small_logos/b03fadf62fa443c473d34c31148c9785ea34dceb.png","batch":"Winter 2024","industry":"B2B","tags":["Developer Tools","Generative AI","Reinforcement Learning","Open Source","AI"],"search_path":"https://bookface.ycombinator.com/company/29347"}}