Blog
18 June 2026
Teaching an Agent From Outcomes: Reinforcement Learning for Multi-Step AI Processes
Prompt tuning and fine-tuning both require labelled examples of correct behaviour. But for complex multi-step agent workflows, you often can only say whether the overall outcome was good. Reinforcement learning is exactly the right tool for that setting.
Read post
Blog
17 June 2026
Fine-Tuning a Small Model on Your Data: What It Takes and What You Get
A fine-tuned 7B model trained on your domain data will outperform a frontier model on generic prompts for well-defined tasks — consistently, cheaply, and without sending your data to a third-party endpoint. Here is what the process actually involves.
Read post
Blog
16 June 2026
Prompt Tuning Without the Guesswork: How Genetic Optimisation Replaces Manual Iteration
Manual prompt engineering has no real feedback loop — you iterate by feel, test on a handful of examples, and hope it generalises. Genetic optimisation replaces that process with a systematic search over production traces. Here is how it works.
Read post
Blog
15 June 2026
Three Ways to Make AI Better at Your Job: Prompt Tuning, Fine-Tuning, and Reinforcement Learning
There are three distinct strategies for making an AI model better at a specific job. Each works differently, costs differently, and produces a different kind of asset. Here is how to choose.
Read post
Blog
14 June 2026
Why 'Human in the Loop' Is Broken — and What to Do Instead
Human-in-the-loop sounds safe. But it contains a structural flaw that guarantees the one genuinely dangerous decision gets the same shallow glance as the thousandth routine one.
Read post
Blog
13 June 2026
Two Ways to Measure What Your AI Doesn't Know
LLMs cannot reliably report their own uncertainty — so you have to measure it from the outside. Here are the two methods that work, and when to use each.
Read post
Blog
12 June 2026
The Only AI Metric That Actually Matters: The Cost of Being Wrong
Most AI deployments chase benchmark accuracy. But in production, value isn't destroyed by average errors — it's destroyed by the single ruinous tail event you didn't cap.
Read post
Blog
10 June 2025
Why Fine-Tuned Small Models Beat Prompt Engineering at Scale
Prompt engineering is a great starting point — but at production scale, a fine-tuned 7B model running on your own infrastructure will outperform a frontier model on generic prompts every time. Here is why, and when to make the switch.
Read post