Moss: Real-time semantic search for Conversational AI

Real-time semantic search for Conversational AI

Moss is building the real-time semantic search runtime for conversational and multimodal AI. Our system enables voice agents, copilots, and chat interfaces to retrieve, reason, and respond in sub-10 ms, delivering the responsiveness that makes AI interactions feel truly natural. If you’ve ever built a conversational or voice AI product, you’ve felt the lag. That moment when an agent pauses and the illusion of intelligence breaks. The bottleneck is almost always retrieval. Each query hops across networks and databases, adding delay and cost. Moss eliminates that gap by keeping retrieval close to where the agent runs. Moss runs natively across browsers, mobile devices, and servers with an optimized vector index built in Rust and WebAssembly. It enables teams to build AI products that feel instant, contextual, and adaptive. The experiences that sustain engagement and unlock new kinds of interaction. For our customers, this translates into tangible business value: stronger user retention, higher conversion, and entirely new product categories made possible by real-time understanding. Moss is already powering production pilots across voice AI and developer platforms, achieving sub-10 ms retrieval and 70–90% token savings compared to traditional pipelines.

Moss: Real-time Semantic Search for Conversational AI

See original launch post

Hello YC! We’re Sri Raghu Malireddi and Harsha Nalluru from Moss.

TL;DR

Moss is a high-performance runtime for real-time semantic search. It delivers sub-10 ms lookups, instant index updates, and zero infra overhead. Moss runs where your agent lives - cloud, in-browser, or on-device - so search feels native and users never wait. You connect your data once; Moss handles indexing, packaging, distribution and updates.

uploaded image

The Problem

If you’ve ever built a conversational or voice AI product, you’ve felt it - that awkward pause when your agent lags or hesitates. The illusion of conversation breaks, and suddenly it feels less like talking to intelligence and more like waiting for a page to load.

The culprit is almost always retrieval. Every query hops across networks and cloud databases, adding seconds of delay. As usage scales, those small lags snowball into lost users, rising infra and egress costs. Teams spend weeks rebuilding embeddings and indexes, tuning search infra just to get “good enough” answers instead of focusing on what actually matters: building great AI experiences!

Solution

Moss puts real-time semantic search in the same runtime as your agent and application, so you can -

Keep retrieval close - embed & index right alongside your agents (docs, chat history, telemetry).
Answer in <10 ms - local lookups, zero network hops.
Manage at scale - store, sync, and distribute indexes with our managed data layer at https://usemoss.dev
Drop-in SDKs - JavaScript and Python.

Our Story

Moss founding team knew each other for 8+ years and bring deep expertise in machine learning, high-performance computing and developer experience.

Sri was an ML Lead at Grammarly and Microsoft, where he shipped LLMs and personalization systems used by millions of users across Office, Bing, and Grammarly. His work on personalization drove 300% retention growth for Grammarly Keyboard and scaling models to 40M+ DAUs. He has published at top ML conferences such as ACL and holds multiple patents in real-time ML.

Harsha was a Tech Lead @ Microsoft, where he architected the core stack of the Azure SDK, powering 400+ cloud services and 100M+ weekly downloads on npm. He also built foundational open-source tools and large-scale test automation systems. Earlier, ranked among the nation’s best in the Olympiads like the IMO and UCO, he combines analytical rigor with large-scale engineering expertise.

The idea for Moss came from our deep frustration with how slow “intelligent” systems actually felt in practice. While building large-scale agentic systems at Microsoft and Grammarly, we kept hitting the same wall - retrieval lag that made even the smartest models feel lifeless. Through evolution, humans are wired to expect instant replies; when AI hesitates, it breaks the illusion of intelligence. We started Moss to fix that by collapsing the multi-hop retrieval stack into a real-time, local-first runtime that lets AI think and respond at the speed of thought.

https://www.youtube.com/watch?v=7-PrunZVXTo

Who is it for?

Platform/Founding Eng: Drop in realtime semantic search. No new service to run.
Infra Leads: Local retrieval to slash p95 and egress.
Agent/Voice PMs: Hit <10 ms so conversations feel natural.
Security/Compliance: Keep sensitive context local (offline-ready).
Data/ML Eng: A/B embeddings & index configs on your corpus - fast.

How to onboard?

Login to the portal - https://usemoss.dev
Start a new Project, create an index either through the portal or JavaScript and Python SDKs
Using our SDKs - init, load and query indexes in sub-10ms.

Our Progress

We’re seeing strong inbound pull from the market with 6 enterprise design partners and 3 paying customers actively building their products around Moss’s core tech, with 7 more actively evaluating. We’re working closely with Voice AI orchestration companies like Pipecat (Daily.co) and LiveKit, embedding Moss at the core of their real-time retrieval and context pipelines. Usage and revenue have been growing ~100% week over week, and Moss is quickly becoming the foundational layer teams rely on to make AI feel instant, contextual, and truly responsive.

Ask

Building in Conversational AI and Voice AI? Use Moss to bring real-time semantic search inside your agents to production
Developers - Get free access to experiment with Moss and help shape the future of real-time conversational AI.

Our contact information -