HomeCompaniesPipeshift

Inference for real-time production workloads

Pipeshift helps engineering teams run real-time inference in production. We offer optimized runtimes to hit latency/throughput SLAs, paired with infrastructure orchestration that auto-scales and routes workloads across clusters and regions.
Active Founders
Arko C
Arko C
Co-founder, CEO
CEO @ Pipeshift. Building scalable infrastructure for open source AI workloads.
Enrique Ferrao
Enrique Ferrao
Founder
CTO @ Pipeshift. Focused on squeezing out max LLM performance from GPUs
Pranav Reddy
Pranav Reddy
Founder
CIO @ Pipeshift. Making LLMs go brrrr at Pipeshift
Company Launches
Pipeshift AI - Fine-tuning and inference for open-source LLMs
See original launch post

TL;DR: Pipeshift is the cloud platform for finetuning and inferencing open-source LLMs, helping teams get to production with their LLMs faster than ever. With Pipeshift, companies making >1000 calls/day on frontier LLMs can use their data and logs to replace GPT/Claude with specialized LLMs that offer higher accuracy, lower latencies, and model ownership. Connect with us.

uploaded image

🧨 The Problem: Building with Open-source LLMs is hard!

The open-source AI stack is missing, forcing most teams to experiment by duct-taping things like TGI/vLLM but having nothing ready for production. As you scale, it requires expensive ML talent, long build cycles, and constant optimizations.

The gap between open-source and closed-source models is shrinking (Meta's Llama 3.1 405B is a testament to that)! And open-source LLMs offer multiple benefits over their closed-source counterparts:

🔏 Model ownership and IP control
🎯 Verticalization and customizability
🏎️ Improved inference speeds and latency
💰 Reduction of API costs at scale

🎉 The Solution: Heroku/Vercel for Open-source LLMs

Pipeshift is the cloud platform for fine-tuning and inferencing open-source LLMs, helping developers get to production with their LLMs faster than ever.

🎯 Fine-tune Specialized LLMs
Run multiple LoRA-based fine-tuning jobs to build specialized LLMs.

⚡️ Serverless APIs of Base and Fine-tuned LLMs
Run inference for your fine-tuned LLMs and pay as per your token usage.

🏎️ Dedicated Instances for High Speed and Low Latency
Use our optimised inference stack to get max throughputs and utilisation on GPUs.

Product Demo: https://youtu.be/z8z5ILyXxCI

Our inference stack is one of the best globally, hitting 150+ tokens/sec on 70B parameter LLMs without any model quantization. And, since our private beta access was opened (<2 weeks back), we have already seen 25+ LLMs being fine-tuned with over 1.8B tokens in training data across 15+ companies.

uploaded image

👋 Ask: How you can help

If you’re building an AI co-pilot/agent/SaaS product and are looking to move to open-source LLMs or know someone who’s looking to do that same, then book a call or mail us at founders@pipeshift.ai - whichever you’d like!

Pipeshift
Founded:2024
Batch:Summer 2024
Team Size:10
Status:
Active
Location:San Francisco
Primary Partner:Tyler Bosmeny