The Mandelbrot Set

The math, and how this repo renders it on the GPU in a few hundred lines.

1. What is it?

The Mandelbrot set is a subset of the complex plane. For every complex number $c = c_x + c_y i$ you run a simple iteration starting at zero:

$z_0 = 0, \qquad z_{n+1} = z_n^2 + c$

You then ask a single question: does the sequence $z_0, z_1, z_2, \ldots$ stay bounded forever, or does it fly off to infinity?

That's the entire definition. The infinite fractal complexity on your screen falls out of those two lines.

2. The escape criterion

You don't actually need to run the iteration forever. A classical result: if $\lvert z_n \rvert > 2$ for any $n$, the sequence is guaranteed to diverge to infinity.

So in practice:

Working in real coordinates (since GPUs don't have native complex numbers), $z_{n+1} = z_n^2 + c$ expands to:

$z_x' = z_x^2 - z_y^2 + c_x$
$z_y' = 2\, z_x z_y + c_y$

This repo uses a larger bailout of $\lvert z \rvert^2 > 256$ instead of the theoretical minimum of 4. Why? It makes the smooth coloring below much smoother — without changing which points end up in the set.

3. Smooth (continuous) iteration coloring

If you just color by the raw integer iteration count at escape, you get visible bands — each pixel is forced into one of $N$ discrete colors.

The smooth iteration count trick gives you a continuous value:

$\nu = n + 1 - \dfrac{\log\left(\frac{\log \lvert z_n \rvert}{\log 2}\right)}{\log 2}$

This is derived from the observation that once $z_n$ escapes past the bailout, one more iteration roughly squares its magnitude — so $\log \log \lvert z_n \rvert$ changes by one unit per iteration. That lets you interpolate between integer iteration counts and kill the banding.

The repo then feeds $\nu$ into a cosine palette — three offset cosines for red, green, blue — to produce the smooth rainbow gradient you see outside the set.

4. How this repo implements it

Two files. That's it.

FileRole
src/main.rsWindowing, input, GPU setup, view-state math (pan/zoom/reset/iter control)
src/shader.wgslThe Mandelbrot math itself — runs once per pixel, in parallel, on the GPU

The full-screen triangle trick

The vertex shader doesn't render geometry — the Mandelbrot "shape" doesn't exist as geometry, it is the coloring of space itself. So the vertex shader emits a single triangle big enough to cover the entire screen:

var p = array<vec2<f32>, 3>(
    vec2<f32>(-1.0, -1.0),
    vec2<f32>( 3.0, -1.0),
    vec2<f32>(-1.0,  3.0),
);

Two of those vertices extend off-screen, but the GPU only shades what's inside the viewport, so every visible pixel gets one fragment-shader invocation. No vertex buffer. No geometry. Just a pretext to run the fragment shader on every pixel in parallel.

Per-pixel: the fragment shader

Each pixel independently:

  1. Converts its UV coordinate into a complex number $c$ based on the current view (center + zoom).
  2. Runs the $z \to z^2 + c$ iteration up to max_iter times.
  3. If it never escapes, outputs black.
  4. Otherwise computes the smooth iteration count $\nu$ and applies the cosine palette.

The core loop in WGSL:

loop {
    if (i >= max_i) { break; }
    let zx = z.x * z.x - z.y * z.y + c.x;
    let zy = 2.0 * z.x * z.y + c.y;
    z = vec2<f32>(zx, zy);
    if (dot(z, z) > bail) { break; }
    i = i + 1u;
}

This loop runs millions of times in parallel every frame — one invocation per pixel — which is why the GPU chews through it in milliseconds and why panning feels instant.

Uniforms: the bridge between CPU and shader

Each frame the CPU writes the current view state (center, zoom, aspect ratio, max iterations) into a uniform buffer, which the shader reads. This is how drag, scroll, and R change what gets rendered without ever recompiling or re-uploading geometry.

Note on precision. The CPU-side View keeps center_x, center_y, and zoom in f64 for precise navigation. On upload, each f64 is split into a (hi, lo) pair of f32s, and the shader iterates in double-float arithmetic — see Deep-zoom precision below. This pushes the pixelation floor from ~$10^{-5}$ down to ~$10^{-14}$ without requiring native f64 support on the GPU.

Zoom-to-cursor

When you scroll the wheel, the repo does more than just change zoom — it also shifts center so that the complex point under your cursor stays fixed. The invariant is:

$\text{pixel\_to\_complex}(\text{mouse})$ is the same before and after the zoom.

This is what makes exploration feel natural: you aim at a detail, scroll, and it grows toward you rather than drifting off the edge of the screen.

5. Deep-zoom precision: double-float arithmetic

The fragment shader doesn't iterate in plain f32. It iterates in double-float — a software-emulated extended precision where each number is stored as a pair $(hi, lo)$ of f32s with the invariant $|lo| \le \tfrac{1}{2}\,\text{ulp}(hi)$. The "real" value represented by the pair is simply $hi + lo$; the $lo$ word holds the bits that would otherwise be discarded by the f32 rounding of $hi$.

Why bother?

f32 has ~7 decimal digits of mantissa. Once your zoom window is smaller than roughly $10^{-6}$ of the complex plane, neighboring pixels map to the same f32 value for $c$ — every pixel inside one block then computes identical iterations, producing visible pixelation.

The natural fix — f64 in the shader — isn't available: WGSL has no f64 type, and on consumer GPUs hardware f64 (where exposed through other APIs) typically runs at 1/32 to 1/64 of f32 throughput. Double-float emulation lands in between: ~14 digits of precision at roughly 4–8× the cost of plain f32, which is far cheaper than true hardware f64 would be.

Error-free transformations

DF arithmetic rests on two primitives that compute $a + b$ or $a \cdot b$ plus the roundoff they incurred, so nothing is lost:

two_sum (Knuth, 1969). Returns $(s, e)$ with $s = \text{fl}(a+b)$ — the normal floating-point sum — and $e = (a+b) - s$ held exactly in the second word:

fn two_sum(a, b) {
    let s = a + b;
    let bb = s - a;
    let e = (a - (s - bb)) + (b - bb);
    return (s, e);
}

This only works because the compiler is not allowed to simplify $(a+b) - a$ to $b$ — in floating-point that identity is false. WGSL forbids the reassociation that would break it, so the trick survives.

two_prod does the same for multiplication, recovering the roundoff via Dekker's splitting trick. The 24-bit f32 mantissa is split at bit 12 using the magic constant $2^{12} + 1 = 4097$:

fn split(a) {
    let t = 4097.0 * a;
    let hi = t - (t - a);    // top 12 bits of mantissa
    let lo = a - hi;          // bottom 12 bits, exact
    return (hi, lo);
}

Once both operands are split into 12-bit halves, the four partial products fit exactly in f32 and can be recombined to reconstruct $a \cdot b$ as a $(hi, lo)$ pair.

From primitives to DF add/multiply

With those in hand, the DF operations are a handful of lines:

The Mandelbrot iteration then rewrites mechanically: every + becomes df_add, every * becomes df_mul, and multiplication by 2 stays a plain scale of both words (exact in IEEE-754, no rounding). The bailout check is an exception — it compares against 256, which sits enormously far from f32 epsilon, so the lo words don't affect the comparison and the shader just uses the hi parts.

A trap to avoid. WGSL has an fma(a, b, c) intrinsic, and it's tempting to shrink two_prod down to one line: $(a \cdot b,\ \text{fma}(a, b, -a\cdot b))$. Don't. The WGSL spec does not require fma to be single-rounded — if the driver emits a separate multiply-then-add, the error term becomes identically zero and the extended precision silently collapses back to f32. Dekker split costs ~10 extra ops per product but works on every backend.

CPU side: splitting f64 into (hi, lo)

The CPU still stores center_x, center_y, and zoom as f64 — plenty of headroom for panning and zooming past the point where DF itself runs out. On each frame, each f64 is split into a pair of f32s and uploaded through the uniform buffer:

fn split_f64(x: f64) -> (f32, f32) {
    let hi = x as f32;
    let lo = (x - hi as f64) as f32;
    (hi, lo)
}

Net result, and the next wall

Pixelation that used to show up past zoom ≈ $10^{-5}$ now sets in around $10^{-14}$ — about seven extra orders of magnitude of usable zoom. Pushing further (the $10^{-30}$ range and below, where classic deep-zoom tools live) requires a different algorithmic class entirely: arbitrary-precision arithmetic to evaluate one reference orbit on the CPU, combined with perturbation theory to render every other pixel on the GPU as a small f32 delta from that reference. That's how tools like Kalle's Fraktaler reach zooms that would otherwise need thousands of decimal digits of precision per pixel. It's out of scope for this repo.

6. Things worth exploring

Try this: reset with R, zoom toward the "neck" where the main cardioid meets the period-2 bulb (around $c = -0.75$). That's the Feigenbaum point — where the iteration transitions from stable to chaotic, and the source of bifurcation theory in dynamical systems.