AirCaps: The AI copilot for in-person conversations.

The AI copilot for in-person conversations.

AirCaps is bringing AI assistance to in-person conversations. Our AI-copilot provides live captions, translations, AI meeting notes and insights for in-person conversations in real-time. It's deployed as an app for lightweight AR glasses so you can see visual information overlaid onto your field of vision. You can think of it like a Zoom AI meeting assistant or Granola, but for in-person conversations, and with the ability to proactively help you in real-time, not just after the conversation. We’re building the capture and intelligence layer for the 216 billion daily conversations that happen face-to-face. We’ve already processed 16,500 hours of real-world conversations and counting. Today, AirCaps assists with 11% of our users’ in-person conversations. We did $93K in revenue in October and grew 6.5x from September while spending <$3K on marketing (and hit $70K in revenue in the first 2 weeks of November). Our power users average 6h+ daily usage and our day-30 retention is 91%. We previously went viral (75M+ views on TikTok, 150K+ followers) and have been featured by The New Yorker, WIRED, and Forbes. Madhav (CEO, Yale CS) has been building in audio and AR since age 13, starting with Google Glass apps and voice assistants for Raspberry Pis. He researched audio AI at the MIT Media Lab. Nirbhay (CTO, Cornell CS) built voice AI on smart glasses in high school and built conversational AI products as the first engineer at 2 YC startups.

Active Founders

Madhav Lavakare

Founder

I've been obsessed with audio and AR for 11 years (46% of my life). I built my first voice assistant on a Raspberry Pi at age 12. I built my first apps for the Google Glass when I was 13 years old. I studied C.S. at Yale, where my senior thesis focused on extracting clean speech transcripts from noisy multi-speaker audio. In the past, I've researched audio AI at the MIT Media Lab where I was part of the team that built the world's first collaborative human-AI live musical concert.

Madhav Lavakare

Founder

I've been obsessed with audio and AR for 11 years (46% of my life). I built my first voice assistant on a Raspberry Pi at age 12. I built my first apps for the Google Glass when I was 13 years old. I studied C.S. at Yale, where my senior thesis focused on extracting clean speech transcripts from noisy multi-speaker audio. In the past, I've researched audio AI at the MIT Media Lab where I was part of the team that built the world's first collaborative human-AI live musical concert.

Nirbhay Narang

Founder

I studied C.S. at Cornell and graduated Summa Cum Laude. I built my first voice apps for AR glasses in high school (an emotional support tool for people with autism). In college, I spent summers as the first engineer at YC startups, building conversational and voice AI platforms for therapy. I've done HCI research at Cornell and have been supported by several fellowships (8VC Fellowship, Kessler Fellowship).

Nirbhay Narang

Founder

I studied C.S. at Cornell and graduated Summa Cum Laude. I built my first voice apps for AR glasses in high school (an emotional support tool for people with autism). In college, I spent summers as the first engineer at YC startups, building conversational and voice AI platforms for therapy. I've done HCI research at Cornell and have been supported by several fellowships (8VC Fellowship, Kessler Fellowship).

Company Launches

AirCaps - The AI copilot for in-person conversations.

See original launch post

TLDR; AirCaps is bringing AI assistance to in-person conversations.

Our AI-copilot provides live captions, translations, AI meeting notes and insights for in-person conversations in real-time. It's deployed as an app for lightweight AR glasses so you can see visual information overlaid onto your field of vision.

You can think of it like a Zoom AI meeting assistant or Granola, but for in-person conversations, and with the ability to proactively help you in real-time, not just after the conversation.

We’re building the capture and intelligence layer for the 216 billion daily conversations that happen face-to-face.

We’ve already transcribed 16,500 hours of in-person conversations and counting. Today, AirCaps assists with 11% of our users’ in-person conversations.

We did $93K in revenue in October and grew 6.5x from September while spending <$3K on marketing (and hit $70K in revenue in the first 2 weeks of November). Our power users average 6h+ daily usage and our day-30 retention is 91%.

We previously went viral (75M+ views on TikTok, 150K+ followers) and have been featured by The New Yorker, WIRED, and Forbes.

uploaded image

The Problem

The average person struggles to understand and retain 50% of in-person conversations, forgets 70% within 48 hours, and can't review any of it.

We have 20-30 in-person conversations daily, meaning humanity generates ~200B in-person conversations (on average 10 minutes, generating 36 billion hours of content) every single day.

While virtual meetings have captions, recordings, and hundreds of AI assistants, these in-person conversations remain completely unassisted by technology.

Why?

Form factor: There's no socially acceptable way to assist conversations in real-time. Phones kill eye contact, AirPods create audio-on-audio confusion, laptops are obvious.
Technical complexity: Real-world audio is fundamentally messier - background noise and overlapping speech make it exponentially harder than virtual meetings. Most companies give up at 80% accuracy.

https://youtu.be/avyjB3LMTI4

Our Solution

We're finally bringing AI to real-world conversations. We’re deploying our software on a discreet, socially acceptable, always-on, and hands-free visual interface: AR glasses.

Our earliest customers are people who need help understanding conversations (captions for real life) or translating languages for in-person business meetings, and meeting-heavy professionals - healthcare workers, executives, salespeople - who need real-time AI assistance during high-stakes conversations but can't use screens without breaking eye contact.

We’re building the killer app for the next platform: think Spotify / Instagram for phones before everyone had a phone.

AR glasses are the only form factor that works for in-person conversations:

Always-on recording without disruption
Visual display in your field of view for real-time assistance (not just post-hoc)
Hands-free mobility and greater environmental awareness
Superior audio capture for better data ingestion
Greater social acceptability compared to phones, headphones, or other wearables for meetings (people already wear glasses)

uploaded image

The Team

Madhav and Nirbhay have been obsessed with voice technology and AR for 11 years and met at an AR hackathon in the summer of 2024.

Madhav (CEO, Yale CS) built his first Google Glass apps at age 13 and his first Jarvis voice assistant for RPis at age 14. He researched audio AI at MIT Media Lab, where he was part of the team that built the world's first live AI-human musical co-performance. His Yale thesis focused on extracting clean transcripts from noisy multi-speaker audio.

Nirbhay (CTO, Cornell CS) started building voice AI on smart glasses in high school, developing an emotional support tool for people with Autism. He's previously built voice and conversational AI platforms for therapy as the first hire at several early-stage (including YC) startups.

uploaded image

Why Now

There’s never been a better time for us: AI is becoming faster & more accurate for speech, and people are starting to default to AI assistance for conversations: 3 out of every 4 professionals uses AI notetakers for virtual meetings. And not to mention: Meta, Apple, Google are investing $200B+ to ensure everyone wears AR glasses.

Ask

If you know doctors who use medical transcription services, or organizations that rely on field sales (real estate, home services, retail), please connect us!