The Token Company: Compression middleware that improves LLM outputs

The Token Company

Compression middleware that improves LLM outputs

Winter 2026

Active

Compression middleware that improves LLM outputs

Compression middleware that removes context bloat in milliseconds, lowering costs and improving end-to-end latency. Compression is especially effective across natural language workloads. In a blind LLM arena case study with one of our customers, compressed requests increased user preference, lowered costs, and lifted purchase volume by 5%.

Active Founders

Otso Veisterä

Founder

Founder of The Token Company

Otso Veisterä

Founder

Founder of The Token Company

Company Launches

The Token Company: Intelligent compression for LLM context bloat

See original launch post

Hey everyone,I’m Otso Veisterä, founder of The Token Company (YC W26).

TL;DR
We build LLM input compression using a fast machine learning model (not a generative LLM). It removes useless tokens from prompts via a drop-in API—cutting token counts, latency, and costs while improving model performance. It compresses 100k tokens in under 100ms.

https://youtu.be/65ScMCGNPss

The Problem
Prompts bloat with redundant tokens from chat history, RAG docs, or large inputs. This wastes context windows, drives up costs, slows inference, and weakens model outputs. Heavy users face high bills and must limit context to stay affordable.

The Solution
Our API compresses prompts intelligently, preserving semantic intent while stripping noise. Integration takes minutes.
Benchmarks show accuracy gains (e.g., +2.7 percentage points on financial QA with up to 20% fewer tokens) and up to 37% faster end-to-end latency.

My Ask
If you're running production LLMs and fighting context bloat, high costs, or latency, email me: otso@thetokencompany.com
https://thetokencompany.com

Hear from the founders

How did you decide to apply to Y Combinator? What was your experience applying, going through the batch, and fundraising at demo day?

I didn’t apply for the winter batch, but after the YC partners reached out and spoke with me, it seemed like the obvious choice. The batch has been awesome so far, and I’ve learned so much. Can’t wait for what’s to come :)

What is the core problem you are solving? Why is this a big problem? What made you decide to work on it?

Context bloat and inefficient LLM inputs. We preprocess inputs to remove useless tokens from the context, which results in faster, cheaper, and more accurate LLM answers.

Currently, this is not being done practically anywhere, which means billions of dollars and quadrillions of tokens are being wasted.

I decided to work on this because it is something much more fundamental than ordinary application layer companies, since it affects each one. And I like hard problems ;)

What is your long-term vision? If you truly succeed, what will be different about the world?

The vision is to optimize every LLM request in the world at the token level before it reaches a model.

Even if model labs dominate, there will still be multiple competing providers. We sit above them as a neutral efficiency layer, optimizing inputs across all models. Labs are incentivized to build bigger models and sell more tokens, not reduce usage across the ecosystem. As long as more than one lab exists, a cross-model optimization layer remains structurally defensible.