HomeCompaniesRunAnywhere
RunAnywhere

The default way of running on-device AI at Scale

Edge AI is inevitable, but shipping it is painful: every device class behaves differently, runtimes vary, models are huge, and performance collapses under memory/power constraints. RunAnywhere turns that into an enterprise-ready workflow: one SDK to run models on-device, plus a control plane to manage models, enforce policies, and measure outcomes across thousands of devices.
Active Founders
Sanchit Monga
Sanchit Monga
Co-founder & CEO
Former Intuit engineer building RunAnywhere, the infrastructure layer for deploying fast, private, multimodal AI on-device at scale. Deep background in mobile SDKs, platform tooling, and developer products, including systems used by 50M+ active users. Previously founded products across consumer discovery, context management, agentic documentation, and mobile testing, and now focused on making on-device AI production-ready across mobile, edge, and embedded devices.
Shubham Malhotra
Shubham Malhotra
Founder
Co-founder & CTO of RunAnywhere (W26). Built MetalRT: the first complete multi-modal inference engine for Apple Silicon. Custom Metal GPU kernels that pushed on-device voice AI from 900ms to ~110ms. Ex-Amazon EC2 Spot ($100M+ ARR), Ex-Microsoft Azure. Peer-reviewed researcher.
Company Launches
RunAnywhere: The default way to run On-Device AI at scale
See original launch post

We're Sanchit and Shubham, co-founders of RunAnywhere (W26).

TL;DR: Run Multi-modal AI fully on-device with one SDK and manage model rollouts + policies from a control plane.

We are already live and open source with ~10.1k stars on GitHub.

https://youtu.be/N3x2bs4ri68

uploaded image

The Problem

Edge AI is inevitable — users want instant responses, full privacy (health, finance, personal data), and AI that actually works on planes, subways, or spotty rural connections.

But shipping it today is brutal:

  • Every device (iPhone 14 vs Android flagship vs low-end) has wildly different memory, thermal limits, and accelerators.
  • Teams waste quarters rebuilding model download/resume/unzip/versioning, lifecycle (load/unload without crashing), multi-engine wrappers (llama.cpp, ONNX, etc.), and cross-platform bindings
  • No real observability — you're blind to fallback rates, per-device perf, crashes tied to model version

Result: most teams either give up on local AI or ship a brittle, hacked-together experience.

The Solution: Complete AI Infrastructure

RunAnywhere isn't just a wrapper around a model. It is a full-stack infrastructure layer for on-device intelligence.

1. The "Boring" Stuff is Built-in We provide a unified API that handles model delivery (downloading with resume support), extraction, and storage management. You don't need to build a file server client inside your app.

2. Multi-Engine & Cross-Platform We abstract away the inference backend. Whether it's llama.cpp or ONNX etc, you use one standard SDK.

  • iOS (Swift)
  • Android (Kotlin)
  • React Native
  • Flutter

3. Hybrid Routing (The Control Plane) We believe the future isn't "Local Only"—it's Hybrid. RunAnywhere allows you to define policies: try to run the request locally for zero latency/privacy; if the device is too hot, too old, or the confidence is low, automatically route the request to the cloud.

Voice AI Pipeline Demo

Try our demo apps:

Our Ask

We're in full execution mode post-launch and hunting design partners + early feedback:

  • Building voice AI, offline agents, privacy-sensitive features (health/enterprise/consumer), or hybrid chat in your mobile/edge app?
  • Want to eliminate cloud inference costs for repetitive queries while keeping complex ones fast?
  • Have a fleet where OTA model updates + observability would save you engineering months?

Get in touch:

Excited to hear what you're building and how we can make on-device AI actually shippable at scale.

YC Photos
RunAnywhere
Founded:2025
Batch:Winter 2026
Team Size:2
Status:
Active
Primary Partner:Diana Hu