Writing

Blog

Thinking on AI strategy, model optimisation, and building sovereign AI systems that compound in value over time.

Blog 18 June 2026

Teaching an Agent From Outcomes: Reinforcement Learning for Multi-Step AI Processes

Prompt tuning and fine-tuning both require labelled examples of correct behaviour. But for complex multi-step agent workflows, you often can only say whether the overall outcome was good. Reinforcement learning is exactly the right tool for that setting.

Blog 17 June 2026

Fine-Tuning a Small Model on Your Data: What It Takes and What You Get

A fine-tuned 7B model trained on your domain data will outperform a frontier model on generic prompts for well-defined tasks — consistently, cheaply, and without sending your data to a third-party endpoint. Here is what the process actually involves.

Blog 16 June 2026

Prompt Tuning Without the Guesswork: How Genetic Optimisation Replaces Manual Iteration

Manual prompt engineering has no real feedback loop — you iterate by feel, test on a handful of examples, and hope it generalises. Genetic optimisation replaces that process with a systematic search over production traces. Here is how it works.

Blog 15 June 2026

Three Ways to Make AI Better at Your Job: Prompt Tuning, Fine-Tuning, and Reinforcement Learning

There are three distinct strategies for making an AI model better at a specific job. Each works differently, costs differently, and produces a different kind of asset. Here is how to choose.

Blog 14 June 2026

Why 'Human in the Loop' Is Broken — and What to Do Instead

Human-in-the-loop sounds safe. But it contains a structural flaw that guarantees the one genuinely dangerous decision gets the same shallow glance as the thousandth routine one.

Blog 13 June 2026

Two Ways to Measure What Your AI Doesn't Know

LLMs cannot reliably report their own uncertainty — so you have to measure it from the outside. Here are the two methods that work, and when to use each.

Blog 12 June 2026

The Only AI Metric That Actually Matters: The Cost of Being Wrong

Most AI deployments chase benchmark accuracy. But in production, value isn't destroyed by average errors — it's destroyed by the single ruinous tail event you didn't cap.

Blog 10 June 2025

Why Fine-Tuned Small Models Beat Prompt Engineering at Scale

Prompt engineering is a great starting point — but at production scale, a fine-tuned 7B model running on your own infrastructure will outperform a frontier model on generic prompts every time. Here is why, and when to make the switch.