From AI adoption to advantage.

We evaluate the latest models on your real workflows, build a custom model you own, and improve it continuously.

What we do

We make AI work on your real workflows.

Evaluate

We build evaluations that assess how today’s leading models perform on your actual workflows, so you see exactly where each one stands before you commit.We build evaluations to assess how leading models perform on your actual workflows.

Build

Custom-trained agentic models, on your data. We post-train an open model matching SOTA model performance on constrained workflows, at about 2% of the cost, on your own machines.Matching SOTA model performance on constrained workflows, at about 2% of the cost.

Improve

Keep the model improving after launch. New cases get caught in production, added to your training data, and retrained, so it keeps getting better.Keep the model improving. New cases get caught, added to training data, and retrained.

Insurance · Motor claims · example

Leading models, scored on this workflow

Accuracy on this workflow, during training

Matches the best models at about 2% of the cost, on your own machines.

After launch

A new edge case slips once, then never again.

Case study

We built our own model to prove it.

HuzzleWorld-8B is our own computer-use model. We trained a small 8B model on our own RL environments and put it on the public board.

#11

on OSWorld

parameters

OSWorld · computer use

#1	Coasty CUA v1	Coasty Team	82.8%
#2	Holo3-35B-A3B	H Company	82.6%
#3	Muse Spark 1.1	Meta Superintelligence Labs	80.7%
⋮
#11	HuzzleWorld-8B	Huzzle Labs	57.0%

An 8B model holding its own beside models many times its size. See the full OSWorld benchmark →

About Huzzle Labs

We build evaluations with frontier AI labs, and work with enterprises across Europe.

European and independent

Based in Europe and not tied to an existing vendor partnership, so we work in your interest.

In-house research and training

We design and train our own models rather than reselling someone else’s.

Full data autonomy

We train on your data and deploy on your own infrastructure, so your data and the models built on it stay yours.

Recognition

Fastest-growing company, UK & Ireland

Sifted · 2026

#14

Fastest-growing company, Europe

J.P. Morgan & Nebius · 2026

Backed by

10X Founders Angel Invest Emerge a16z Scout Fund Thomas Wolf Hugging Face Bernd Heinemann Allianz Verena Pausder German Startup Assoc. Yaser Khalighi Stanford

A good first step

Start with one workflow.

Take one workflow, like motor claims, and we will measure how today’s models perform on it against your own data. From there we can post-train a model you own that matches them at a fraction of the cost. Usually a few weeks.

Talk to the founders Or email us →