RL Environments Enterprise Talk to founders opens Calendly

From AI adoption to advantage.

We evaluate the latest models on your real workflows, build a custom model you own, and improve it continuously.

What we do

We make AI work on your real workflows.

Evaluate

We build evaluations that assess how today’s leading models perform on your actual workflows, so you see exactly where each one stands before you commit.We build evaluations to assess how leading models perform on your actual workflows.

Build

Custom-trained agentic models, on your data. We post-train an open model matching SOTA model performance on constrained workflows, at about 2% of the cost, on your own machines.Matching SOTA model performance on constrained workflows, at about 2% of the cost.

Improve

Keep the model improving after launch. New cases get caught in production, added to your training data, and retrained, so it keeps getting better.Keep the model improving. New cases get caught, added to training data, and retrained.

Insurance · Motor claims · example
Leading models, scored on this workflow
Accuracy on this workflow, during training
Best off-the-shelf · 89% Open model, base · 45% Custom-trained · 82% training steps →
Matches the best models at about 2% of the cost, on your own machines.
After launch
CONTINUOUS Improvement loop Production New edge case Added to training data Retrained Redeployed
A new edge case slips once, then never again.
Case study

We built our own model to prove it.

h-1 is our own computer-use model. We trained a small 8B model on our own RL environments, and put it on the public board.

#9
on OSWorld
8B
parameters

An 8B model holding its own beside models many times its size. See the full OSWorld benchmark →

About Huzzle Labs

We build evaluations with frontier AI labs, and work with enterprises across Europe.

European and independent

Based in Europe and not tied to an existing vendor partnership, so we work in your interest.

In-house research and training

We design and train our own models rather than reselling someone else’s.

Full data autonomy

We train on your data and deploy on your own infrastructure, so your data and the models built on it stay yours.

Recognition
#7
Fastest-growing company, UK & Ireland
Sifted · 2026
#14
Fastest-growing company, Europe
J.P. Morgan & Nebius · 2026
Backed by
10X Founders Angel Invest Emerge a16z Scout Fund Thomas Wolf Hugging Face Bernd Heinemann Allianz Verena Pausder German Startup Assoc. Yaser Khalighi Stanford
A good first step

Start with one workflow.

Take one workflow, like motor claims, and we will measure how today’s models perform on it against your own data. From there we can post-train a model you own that matches them at a fraction of the cost. Usually a few weeks.