Lemma: Continuous learning for AI agents

Continuous learning for AI agents

Lemma enables AI agents to continuously improve from user feedback and production outcomes. Instead of relying on manual prompt tuning and reactive debugging, Lemma detects when an agent’s performance drifts, pinpoints the exact step causing failures, generates optimized prompt candidates, and automatically delivers improvements through our API or by opening a pull request in your codebase. Built by ex-AI engineers who felt this pain firsthand, Lemma turns static agents into self-improving systems, helping teams ship reliable AI products faster with compounding performance gains over time. Deploy 1x. Learn forever.

Active Founders

Jerry Zhang

Founder

Co-founder / CEO @ Lemma (YC F25) | Continuous learning for AI agents

Jerry Zhang

Founder

Co-founder / CEO @ Lemma (YC F25) | Continuous learning for AI agents

Cole Gawin

Founder

reimagining prompt engineering @ lemma (f25) | ex-USC

Cole Gawin

Founder

reimagining prompt engineering @ lemma (f25) | ex-USC

Company Launches

Lemma: Continuous Learning for AI Agents

See original launch post

TL;DR:

Lemma is the first evaluation + observability platform built not just to measure performance, but to improve it automatically. We help AI agents learn from real user feedback and production data, closing the loop so your prompts and agents continuously optimize themselves over time.

Launch Video: https://www.youtube.com/watch?v=E4_v-pY_4fs

Hey everyone! We’re Jerry and Cole, co-founders of Lemma (YC F25).

The Problem:

AI agents don’t learn from their mistakes. In fact, they get worse with use.

In production, prompts and agents continuously degrade due to real-world input drift (new user behaviors or unseen edge cases). Agent performance can often drop ~40% in a few weeks, and suddenly what worked in testing breaks in front of customers.

When that happens, engineers are forced to dig through logs, collect failing examples, and manually tweak prompts rather than building core product features.

Solution:

That’s why we built Lemma: the first end-to-end system that closes the loop between agent deployment and improvement.

Here's what that means:

Step 1: Lemma detects failed outcomes directly from live traffic, and it automatically identifies the exact cause in an agent chain.

Step 2: Lemma alerts you, and with one click, it runs targeted prompt optimizations to fix the failing behavior without any manual tracing or guesswork.

Step 3: We give you back an improved prompt and automatically open a PR in your codebase so your prompts can live where you want them. Alternatively, you can also fetch your prompt from the Lemma dashboard.

Plus, Lemma provides all the LLM eval and observability features you rely on, just reimagined for continuous learning:

Data-ingestion pipeline to bring your existing eval sets and automatically flag inconsistencies and gaps

uploaded image

Prompt editor and inference support for any closed & open-source model for prompt iteration.

uploaded image

Agent tracing observability with live drift detection, regression alerts, and performance visibility across real user interactions.

uploaded image

Teams using Lemma cut manual prompt iteration by 90%, resolve production drifts in minutes instead of days, and improve model performance ~2–5% every optimization cycle.

Our Story:

We met freshman year at USC and have been building together ever since instead of going to classes.

Before starting Lemma, we were engineers at two high growth, AI-native startups: Tandem (AI for healthcare) and Chipstack (AI agents for chip design). At both companies, setting up evaluations looked like clunky Retool dashboards and multiple engineers manually tweaking experiments. We built internal systems that automated both running the evaluations themselves, as well as the error-driven feedback loop. The result: 2x accuracy improvement and speed of iteration.

We soon realized every AI company was reinventing the same internal tooling in-house. So we left college, joined YC, and are now bringing continuous learning infrastructure to everyone else.

uploaded image

Ask

Try our platform - If you’re building with LLMs and run a ton of prompt or eval experiments, we’d love for you to work with us.

Introductions - If you know a Head of AI/Eng or CTO at a pre-seed to Series A startup, we owe you lunch :)

Please reach out at jerry@uselemma.ai or book a live demo on our website uselemma.ai. All help is appreciated - thank you!

uploaded image