Engine · build notes

Caïssa: a chess engine from scratch, and how it taught itself

No chess libraries. The rules, the search, the evaluation — all hand-written in Rust, compiled to WebAssembly, then handed back to itself to learn from.

2026-06-04 · source on GitHub · play it →

Caïssa is a chess engine I wrote from nothing — no move-generation crate, no search library, no evaluation borrowed from elsewhere. The point was to understand every layer by building it, then to make it do something most hobby engines don't: learn its own evaluation by playing itself. This is how the parts fit together, kept to the point.

1. The rules, and proving they're right

The board is the dullest possible thing — an 8×8 mailbox, [Option<Piece>; 64] — chosen for clarity over raw speed. Move generation is pseudo-legal generation plus a legality filter: generate every candidate move, make it, ask whether our own king is now attacked, and keep it only if not. One filter quietly handles pins, checks, en-passant edge cases, and self-checks without special-casing any of them.

The trap with move generation is that bugs are invisible — the engine just quietly plays an illegal move or misses a legal one. The fix is perft: walk the full move tree to depth N and count the leaf nodes, then compare against published reference counts. Caïssa is checked against five standard positions (including the notorious "Kiwipete") to hundreds of thousands of nodes. If a single castling or promotion rule were wrong, the count would diverge. They match — so move generation is provably correct, not just correct-looking.

2. How it thinks: negamax + alpha-beta

Chess is zero-sum: one side's gain is exactly the other's loss. That symmetry collapses the search into a single recursive function — negamax — where every node maximises the negation of its child's score. Onto that goes alpha-beta pruning: once a reply is found that the opponent will never allow, the rest of that branch can be abandoned unsearched. Mates are scored by distance-to-mate, so the engine prefers a mate in one over a mate in five.

fn negamax(board, depth, alpha, beta, params, ...) -> i32 {
    if depth == 0 { return evaluate_with(board, params); }
    let mut best = -INF;
    for mv in legal_moves(board) {
        let score = -negamax(board, depth-1, -beta, -alpha, params, ...);
        best = max(best, score);
        alpha = max(alpha, best);
        if alpha >= beta { break; }   // the opponent won't allow this line
    }
    best
}

3. How it judges: a linear evaluation

At the leaves, the search needs a number. Caïssa's evaluation is deliberately simple: material (the value of each piece) plus piece-square tables — a small bonus or penalty for each piece depending on where it stands, so knights like the centre and pawns want to advance. That's about 390 numbers in total (six material values plus six 64-square tables).

Crucially, the score is linear in those 390 numbers: it's just a weighted sum of feature counts. That choice looks unremarkable here — but it's the hinge the whole learning story turns on, because the gradient of a linear function is trivial.

4. Watching it think

Rather than search to a fixed depth, Caïssa uses iterative deepening: it searches depth 1, then 2, then 3, and so on, each time trying the previous depth's best move first. That sounds wasteful, but it isn't — alpha-beta prunes far harder once a strong move has already raised the bar, so the extra shallow passes more than pay for themselves, and the engine always has a legal move ready if its time runs out.

It also makes the search watchable. After each completed depth the engine emits one line — depth, score, nodes searched, best move so far — which on the play page streams into a live console as it thinks:

d1  +0.20      21n  → e2e4
d2  +0.15     310n  → g1f3
d3  +0.30   5,012n  → e2e4

5. Running in your browser

The same Rust — the exact unit-tested rules, search, and evaluation — compiles to WebAssembly via wasm-bindgen. There is no server and no round trip: the page holds the position, hands each move to the WASM engine, and gets the reply directly. The search is synchronous Rust, so it runs in a Web Worker to keep the page responsive — and that worker posting a message per depth is exactly what feeds the live console. In Watch mode the opponent is a full Stockfish build, also running client-side in its own worker.

6. Teaching it to teach itself

Hand-tuning 390 evaluation weights by feel is a losing game. So Caïssa tunes them by self-play reinforcement learning — specifically TD-leaf(λ), the classic algorithm (TD-Gammon → KnightCap) for learning a linear evaluation purely from games. No neural network, no ML framework; it's textbook temporal-difference learning you can read end to end.

The loop, one generation at a time: play a batch of games against itself (with a little randomness so games vary); for each game, nudge the weights so that each position's evaluation predicts the eventual outcome a little better; save the new brain; measure its strength in a gauntlet against the frozen original. Because the evaluation is linear, the gradient that drives the nudge is just the feature counts — no autodiff required. That's why step 3 mattered.

And the result, told straight. The reliable signal is the outcome-prediction error on a fixed set of holdout positions — it fell every single generation:

gen 1   err 0.4778    gen 4   err 0.4755
gen 2   err 0.4770    gen 5   err 0.4748
gen 3   err 0.4762    gen 6   err 0.4741

Head-to-head Elo against the baseline is much noisier — at 60 games per measurement it bounced between −23 and +29 — but it trended positive. The two strongest generations are on the board:

Brain	Elo vs baseline	note
Gen 0	0	the hand-tuned baseline
Gen 3	+29	strongest by gauntlet
Gen 5	+17	lower error, sharper eval

This isn't AlphaZero. It's a small, honest demonstration that a linear evaluation can improve itself from self-play alone — and that "improvement" is noisy and non-monotonic when you look at it closely, which is the real shape of reinforcement learning, not the cleaned-up version.

Play the generations

On the board you can pick which brain you face from the Brain dropdown — the hand-tuned Gen 0, or a generation that taught itself. Same search, same rules; only the 390 numbers behind the evaluation change.

Play Caïssa → Read the source →

Built in Rust by Georgios Giannoutsos. Stockfish is GPL-3.0; piece art is Lichess’ cburnett set (GPL).