RL Environments Enterprise Talk to founders opens Calendly

Long‑horizon RL environments for frontier AI labs.

Realistic, long-horizon environments for code, computer-use, and enterprise workflows that challenge SOTA models.

What we build

RL environments, custom & OTS.

Executable worlds with programmatic graders across three frontiers, hard enough that today’s best models fail most tasks.

Coding

Multi-file repos with build, run and test loops. Agents plan, edit, execute and repair across long sessions, graded against SWE-bench.

Computer-use

Full desktop and browser control, judged on the end state of long, multi-step tasks. Benchmarked on OSWorld.

Enterprise workflows

CRMs, spreadsheets, ticketing and finance — the real work companies run, with custom graders on your data.

Coding · sample environments
Pass@8 mean · SOTA models · 20 tasks
Claude Opus 4.831.4%
GPT 5.527.2%
Sample environments · shared with customers
rails-ecommerce-bugsRails · Spree
react-frontend-bugsReact · Redux
typescript-video-implTypeScript
Difficulty Pass@8 11–37% · eight trials (K=8), graded reward
Computer-use · sample environments
Pass@8 mean · SOTA models · 33 environments
Claude Opus 4.822.5%
GPT 5.518.0%
Sample environments · shared with customers
research-agentmulti-step
message-triageinbox
content-publishingweb apps
Graded on the end state · eight trials, continuous reward
Enterprise workflows · sample environments
Pass@8 mean · SOTA models · 33 environments
Claude Opus 4.819.6%
GPT 5.515.3%
Sample environments · shared with customers
access-review-quarterlyaccess
quarterly-tax-prepfinance
customer-escalationsupport
Custom graders on your data · eight trials, continuous reward

What makes us different.

Verifiable
data quality
Expert
network
Huzzle Labs
Data quality

Verifiable downstream model improvements

h-1, our 8B computer-use model, is trained entirely on our own computer-use environments — and ranks #9 on OSWorld, beside models many times its size.

Holo3-35B-A3B82.6
MiniMax M375.2
h-1 · 8B57.0
Scale

300k+ expert network

Built on Huzzle.com. Our AI recruiter sources vetted specialists for any domain, on demand.

100k monthly active88 expert NPS

The result — hundreds of high-quality tasks per week. Thousands per month.

Comparison
RL-env startupsDeeptune · Mechanize
Human-data co’sMercor · Scale
Focus on RL environments
Expert access & operational scale
Get started

Request sample data.

Tell us what you’d like to see and we’ll tailor the sample to you.

Which environments?
or talk to the founders