Sylvian: Data for LLMs through Competition

Sylvian incentivizes high quality expert data for LLMs through Competition

William Huang

a month ago

#b2b#reinforcement_learning#artificial_intelligence

We are William Huang and Niall Kehoe, and we’re launching Sylvian! We gather expert data for LLMs through competitions, starting with tool use (e.g. Excel, VSCode).

uploaded image

Problem

Scaling laws dictate that LLMs continue to need lots of expert data, despite advances in RL. Even RL environments require data—for example, an environment that allows LLMs to edit spreadsheets in Excel would still require sample spreadsheets to define the task at hand.

Unfortunately, existing data vendors like Scale or Mercor are not motivating the best experts through part-time pay.

Our Solution

Sylvian hosts competitions where the best experts are motivated by the thrill of moving up leaderboards and the prestige of winning a large competition. We’re starting with tool use data because there are many expert communities surrounding tools like Excel and VSCode.

We already have 4,500+ experts, from IMO Golds to MIT/Stanford PhDs to full time QRs at Point72, producing data at 1B tokens/week!

Our data is consistently at the frontier. Below is the result of one of our latest benchmarks we made with VSCode data from Data Science experts. See sylvian.ai for more details.

uploaded image

Our Story

William won an IPhO Gold at 17 and Niall has won international coding contests at 13, since then we’ve went on to be a part of Stanford CS, Harvard Medical School, Citadel Securities, Two Sigma, and Waymo.

Our Ask

If you’re in need of expert tool use data, we’d love to speak to you! You can reach us at founders@sylvian.ai or visit sylvian.ai.