Pipeshift

Inference for real-time agents

Active

AIOps

Artificial Intelligence

Infrastructure

San Francisco

https://pipeshift.com

Inference for real-time agents

Pipeshift helps engineering teams run real-time inference in production. We offer optimized runtimes to meet latency/throughput SLAs, paired with infrastructure orchestration that auto-scales and routes workloads across clusters and regions at cost-effective rates.

Active Founders

Arko C

Co-founder, CEO

CEO @ Pipeshift. Building scalable infrastructure for open source AI workloads.

Arko C

Co-founder, CEO

CEO @ Pipeshift. Building scalable infrastructure for open source AI workloads.

Enrique Ferrao

Founder

CTO @ Pipeshift. Focused on squeezing out max LLM performance from GPUs

Enrique Ferrao

Founder

CTO @ Pipeshift. Focused on squeezing out max LLM performance from GPUs

Pranav Reddy

Founder

CIO @ Pipeshift. Making LLMs go brrrr at Pipeshift

Pranav Reddy

Founder

CIO @ Pipeshift. Making LLMs go brrrr at Pipeshift

🧨 The Problem: Building with Open-source LLMs is hard!

The open-source AI stack is missing, forcing most teams to experiment by duct-taping things like TGI/vLLM but having nothing ready for production. As you scale, it requires expensive ML talent, long build cycles, and constant optimizations.

The gap between open-source and closed-source models is shrinking (Meta's Llama 3.1 405B is a testament to that)! And open-source LLMs offer multiple benefits over their closed-source counterparts:

🔏 Model ownership and IP control
🎯 Verticalization and customizability
🏎️ Improved inference speeds and latency
💰 Reduction of API costs at scale

🎉 The Solution: Heroku/Vercel for Open-source LLMs

Pipeshift is the cloud platform for fine-tuning and inferencing open-source LLMs, helping developers get to production with their LLMs faster than ever.

🎯 Fine-tune Specialized LLMs
Run multiple LoRA-based fine-tuning jobs to build specialized LLMs.

⚡️ Serverless APIs of Base and Fine-tuned LLMs
Run inference for your fine-tuned LLMs and pay as per your token usage.

🏎️ Dedicated Instances for High Speed and Low Latency
Use our optimised inference stack to get max throughputs and utilisation on GPUs.

Product Demo: https://youtu.be/z8z5ILyXxCI

Our inference stack is one of the best globally, hitting 150+ tokens/sec on 70B parameter LLMs without any model quantization. And, since our private beta access was opened (<2 weeks back), we have already seen 25+ LLMs being fine-tuned with over 1.8B tokens in training data across 15+ companies.

uploaded image

👋 Ask: How you can help

If you’re building an AI co-pilot/agent/SaaS product and are looking to move to open-source LLMs or know someone who’s looking to do that same, then book a call or mail us at founders@pipeshift.ai - whichever you’d like!

YC Photos