From AI adoption to advantage.
We evaluate the latest models on your real workflows, build a custom model you own, and improve it continuously.
We make AI work on your real workflows.
Evaluate
We build evaluations that assess how today’s leading models perform on your actual workflows, so you see exactly where each one stands before you commit.We build evaluations to assess how leading models perform on your actual workflows.
Build
Custom-trained agentic models, on your data. We post-train an open model matching SOTA model performance on constrained workflows, at about 2% of the cost, on your own machines.Matching SOTA model performance on constrained workflows, at about 2% of the cost.
Improve
Keep the model improving after launch. New cases get caught in production, added to your training data, and retrained, so it keeps getting better.Keep the model improving. New cases get caught, added to training data, and retrained.
We built our own model to prove it.
h-1 is our own computer-use model. We trained a small 8B model on our own RL environments, and put it on the public board.
| #1 | Holo3-35B-A3B | H Company | 82.6% | |
| #2 | MiniMax M3 | MiniMax | 75.2% | |
| #3 | Qwen 3.7 Plus | Qwen Team, Alibaba Group | 73.3% | |
| ⋮ | ||||
| #9 | h-1 (8B) | Huzzle Labs | 57.0% |
An 8B model holding its own beside models many times its size. See the full OSWorld benchmark →
We build evaluations with frontier AI labs, and work with enterprises across Europe.
Based in Europe and not tied to an existing vendor partnership, so we work in your interest.
We design and train our own models rather than reselling someone else’s.
We train on your data and deploy on your own infrastructure, so your data and the models built on it stay yours.
Start with one workflow.
Take one workflow, like motor claims, and we will measure how today’s models perform on it against your own data. From there we can post-train a model you own that matches them at a fraction of the cost. Usually a few weeks.