
Real-time B2B data via simple APIs
Skills: Python, PyTorch, NLP, LLMs, Information Retrieval, Entity Resolution, Text Classification
We're building the gateway to the internet for AI agents. Our APIs already power hundreds of customers — and we went from 0 to $7M ARR in our first 12 months. Now we need someone who can push the boundaries of what our ML systems can do.
We're hiring an ML Engineer Intern to work directly with our founding team on the research and engineering behind our core intelligence layer. Our platform indexes hundreds of millions of professional profiles and company records from across the web. Making that data searchable, matchable, and enriched is an ML problem at its core.
This is a 12-week summer internship (June–August 2026). You will not be fetching coffee or watching from the sidelines. You will be researching, training, and shipping models — from paper to prototype to production. Previous interns' work has shipped to customers within weeks.
You'll own real ML problems that turn messy, multilingual, web-scale data into structured intelligence. Some example problems:
The way information on the internet is consumed is changing. It's shifting from humans searching pre-crawled information on Google via point and click to AI agents doing real-time targeted crawling from sources of truth. The interface, the workflow, and the density of retrieved data and its fidelity, which worked for humans, doesn't work for AI agents.
At Crustdata, we are building the gateway to the internet for AI agents. In simple terms, we're building the APIs for AI Agents to access real-time data from sources of truth. We already serve dozens of enterprise customers, are profitable and growing very fast. We're backed by some of the best investors in Silicon Valley including Y Combinator, General Catalyst, SV Angel, A Capital and Liquid 2 Ventures among others.