
Hey everyone,I’m Otso Veisterä, founder of The Token Company (YC W26).
TL;DR
We build LLM input compression using a fast machine learning model (not a generative LLM). It removes useless tokens from prompts via a drop-in API—cutting token counts, latency, and costs while improving model performance. It compresses 100k tokens in under 100ms.
The Problem
Prompts bloat with redundant tokens from chat history, RAG docs, or large inputs. This wastes context windows, drives up costs, slows inference, and weakens model outputs. Heavy users face high bills and must limit context to stay affordable.
The Solution
Our API compresses prompts intelligently, preserving semantic intent while stripping noise. Integration takes minutes.
Benchmarks show accuracy gains (e.g., +2.7 percentage points on financial QA with up to 20% fewer tokens) and up to 37% faster end-to-end latency.
My Ask
If you're running production LLMs and fighting context bloat, high costs, or latency, email me: otso@thetokencompany.com
https://thetokencompany.com
I didn’t apply for the winter batch, but after the YC partners reached out and spoke with me, it seemed like the obvious choice. The batch has been awesome so far, and I’ve learned so much. Can’t wait for what’s to come :)
Context bloat and inefficient LLM inputs. We preprocess inputs to remove useless tokens from the context, which results in faster, cheaper, and more accurate LLM answers.
Currently, this is not being done practically anywhere, which means billions of dollars and quadrillions of tokens are being wasted.
I decided to work on this because it is something much more fundamental than ordinary application layer companies, since it affects each one. And I like hard problems ;)
The vision is to optimize every LLM request in the world at the token level before it reaches a model.
Even if model labs dominate, there will still be multiple competing providers. We sit above them as a neutral efficiency layer, optimizing inputs across all models. Labs are incentivized to build bigger models and sell more tokens, not reduce usage across the ecosystem. As long as more than one lab exists, a cross-model optimization layer remains structurally defensible.