Trampolining Nix with genericClosure

nix-repl> let count = n: if n == 0 then 0 else 1 + count (n - 1); in count 10001
error: stack overflow; max-call-depth exceeded

Ten thousand and one additions. Nix gives up.

The language is pure, lazy, and has no loops. Every iteration is recursion, and recursion costs stack frames. Since Nix 2.20, the evaluator caps call depth at 10,000 (configurable via max-call-depth, but the default is what you'll hit). Before 2.20, the limit was whatever your OS allocated for the process stack: non-deterministic across machines, occasionally baffling to debug. Tail-call optimization would help. There's even a FIXME comment in ExprApp::eval() acknowledging it. But the evaluator's structure (a local variable that stays live across the recursive eval call) prevents the tail position from being optimized, and nobody has restructured the code. Tvix, the Rust-based evaluator, handles TCO in many cases. The reference C++ evaluator doesn't.

For most Nix code, 10,000 frames is plenty. You won't hit it writing package derivations or NixOS modules. But lib.splitString overflows on inputs past roughly 280 lines. Deep NixOS configurations with home-manager and stylix chain through enough option definitions to reach the limit. And if you're building something where computation length is proportional to input size (an interpreter, a validator, a state machine), the wall is immediate.

This post shows how to get O(1) Nix call-stack depth using a builtin that was designed for something else entirely.

ⓘ

To follow this post, you need to know what a Nix expression is and have access to nix repl. No category theory, no algebraic effects, no free monads.

The only iterative builtin

builtins.genericClosure {
  startSet = [ { key = "root"; /* ... */ } ];
  operator = node: [ /* new nodes to visit */ ];
}

genericClosure implements a worklist algorithm. Give it a starting set of nodes and an operator function. The operator receives each node and returns new nodes to explore. If a key has already been seen, the node gets skipped. The result is every node visited, in discovery order.

A concrete example. Suppose you're computing which packages a derivation needs:

builtins.genericClosure {
  startSet = [{ key = "a"; deps = ["b" "c"]; }];
  operator = node:
    map (d: { key = d; deps = []; }) node.deps;
}
# => [ { key = "a"; deps = [ "b" "c" ]; }
#      { key = "b"; deps = [ ]; }
#      { key = "c"; deps = [ ]; } ]

Start at "a", discover "b" and "c", visit each once. The key field controls deduplication; everything else rides along as payload. That payload is where we'll hide computation state.

The implementation is a C++ while (!workSet.empty()) loop: no recursion, no stack growth, a ValueList worklist and an std::map for O(log n) key deduplication. Its original purpose was computing package dependency closures. When nixpkgs replaced the quadratic lib.closePropagation with a single genericClosure call, evaluation got 15.5x faster.

foldl' also iterates in C++ with a strict accumulator, but it requires a pre-built list of known length. You decide how many iterations before you start. genericClosure is the only builtin where the operator controls when to stop, where termination is part of the computation rather than a precondition. There is no builtins.while. There is no builtins.until. A proposal for builtins.trampoline has been open since June 2023: three upvotes, zero comments, proof-of-concept PR not merged. ^[2] The momentum isn't there.

Repurposing as trampoline

sternenseemann saw this in 2022. ^[1] Each step of your computation becomes a "node." The operator handles one step and returns a singleton list containing the next. Return [] when done.

steps = builtins.genericClosure {
  startSet = [{ key = 0; state = initialState; }];
  operator = step:
    let next = oneStep step.state;
    in if next.done then []
       else [{ key = step.key + 1; state = next.state; }];
};

Stack depth: O(1). The operator is called from C++, returns a list, and the C++ loop calls it again. Nix call frames don't accumulate.

sternenseemann published a tailCallOpt wrapper that converts any tail-recursive function into this pattern. He tested it, confirmed it worked, and concluded it was "pretty much useless" due to the overhead of copying state at every step. ^[1] He also noticed a deeper problem, one he explicitly said he hadn't solved.

The thunk trap

genericClosure forces the key field of each node for deduplication. Everything else stays lazy. What "everything else" means depends on Nix's call-by-need evaluation, and the interaction is subtle. Try a naive trampoline in nix repl:

let
  naive = init: step:
    let
      steps = builtins.genericClosure {
        startSet = [{ key = 0; state = init; }];
        operator = s:
          let result = step s.state;
          in if result.done then []
             else [{ key = s.key + 1; state = result.state; }];
      };
    in (builtins.elemAt steps (builtins.length steps - 1)).state;

in naive { i = 0; total = 0; } (s:
    if s.i >= 100
    then { done = true; state = s; }
    else { done = false; state = { i = s.i + 1; total = s.total + 1; }; })

At 100 steps, this returns { i = 100; total = 100; }. Change the target to 65000 and run again. genericClosure completes all 65,000 steps without issue. Then Nix tries to print the result and crashes.

Two fields, two fates. The comparison s.i >= 65000 forces s.i at every step. Nix is call-by-need: once a thunk is forced, the runtime updates it in place with the concrete value. The next step's s.i + 1 references that concrete integer. No chain forms for i. But nothing in the loop inspects s.total. After N steps, total is a chain of N deferred additions:

((((0 + 1) + 1) + 1) + ...)
 └── N layers deep

Here's what makes this insidious: the trampoline runs fine. genericClosure's C++ loop processes all 65,000 steps without complaint. The failure happens when you try to use the result. Forcing that final total unwinds the entire thunk chain as recursive C++ forceValue calls, rebuilding exactly the stack depth you thought you'd eliminated. The error is stack overflow (possible infinite recursion), not max-call-depth exceeded: this is the C++ call stack, not the Nix evaluator's depth limit. A simple integer counter where the comparison is the state (n: if n >= N then ...) would survive, because the comparison forces the state at every step and call-by-need memoization prevents the chain. The trap springs when your state has components the step function doesn't touch.

The stack overflow moves from the computation to the result. It's invisible until you access the value, which might be far from the trampoline call site, in a completely different part of your program.

sternenseemann's diagnosis was precise: "thunks can be built up by the value in the set, and I've not toyed with this." ^[1] Correct observation. No published solution, as far as we can find. Four years and counting.

Breaking the chain

key = builtins.deepSeq newState (step.key + 1);

One line. builtins.deepSeq x y fully evaluates x, then returns y. Embed deepSeq newState inside the key expression, and you piggyback on genericClosure's key-forcing. When genericClosure forces the key for dedup, it forces the state too. Every step. The thunk chain never forms.

The key still increments monotonically. Deduplication never triggers because every key is unique, which means we're abusing the dedup mechanism for a side effect it was never designed to provide: forcing evaluation of whatever we embed in the key expression, in this case the entire computation state, at every step of the computation. Smuggling eager evaluation into a lazy language through the one field that genericClosure is guaranteed to inspect.

We searched Discourse, GitHub issues, nixpkgs source, blog posts. To the best of our knowledge, this trick hasn't been documented elsewhere. If someone described it before nix-effects, we'd like the reference. We haven't found one.

Try it

Paste into nix repl:

let
  trampoline = init: step:
    let
      steps = builtins.genericClosure {
        startSet = [{ key = 0; state = init; }];
        operator = s:
          let result = step s.state;
          in if result.done then []
             else [{
               key = builtins.deepSeq result.state (s.key + 1);
               state = result.state;
             }];
      };
    in (builtins.elemAt steps (builtins.length steps - 1)).state;

in (trampoline { i = 0; total = 0; } (s:
    if s.i >= 100000
    then { done = true; state = s; }
    else { done = false; state = { i = s.i + 1; total = s.total + 1; }; })).total

Returns 100000. A hundred thousand iterations, O(1) stack depth. The deepSeq forces both i and total at every step, so no thunk chain survives.

Now try the trap. Remove builtins.deepSeq result.state from the key line, use s.key + 1 directly, and run it again. genericClosure will churn through all 100,000 steps without complaint. Then Nix tries to evaluate .total, forces a chain 100,000 thunks deep, and dies. The error appears to come from the field access, not the computation. That misdirection is exactly why sternenseemann shelved the whole approach.

Measuring the cost

The same computation from the opening, recursive:

$ nix eval --expr 'let count = n: if n == 0 then 0 else 1 + count (n - 1); in count 10001'
error: stack overflow; max-call-depth exceeded

Raise the target to 50,000, a million, ten million. Same result. The evaluator has no tail-call optimization and max-call-depth defaults to 10,000.

With the trampoline:

$ time nix eval --expr '
  let
    trampoline = init: step:
      let
        steps = builtins.genericClosure {
          startSet = [{ key = 0; state = init; }];
          operator = s:
            let result = step s.state;
            in if result.done then []
               else [{
                 key = builtins.deepSeq result.state (s.key + 1);
                 state = result.state;
               }];
        };
      in (builtins.elemAt steps (builtins.length steps - 1)).state;
  in trampoline 0 (n:
    if n >= 1000000
    then { done = true; state = n; }
    else { done = false; state = n + 1; })
'
1000000
real  0m1.0s

Same computation with foldl', which iterates in C++ with a strict accumulator over a pre-built list:

$ time nix eval --expr 'builtins.foldl'\'' (acc: _: acc + 1) 0 (builtins.genList (x: x) 1000000)'
1000000
real  0m0.09s

One crashes at ten thousand, the other handles a million. The trampoline is slower than foldl' because genericClosure's std::map does O(log n) per step for key dedup, and with compound state deepSeq adds forcing cost on top. We benchmarked both with hyperfine (15+ runs each, 5 warmup, IQR-filtered outliers from GC and system load):

	trampoline	foldl'	ratio
1M iterations (int state)	1.03s (σ = 45ms, n = 34)	94ms (σ = 6ms, n = 58)	~11x
1M iterations (compound state)	1.45s (σ = 69ms, n = 33)	stack overflow	n/a

Measured on a 13th Gen Intel i7-1355U, 16GB, NixOS, Nix 2.31.3. The "compound state" row uses { i = 0; total = 0; } where deepSeq must force both fields at every step.

The foldl' overflow is worth seeing for yourself:

$ nix eval --expr 'builtins.foldl'\'' (acc: _: { i = acc.i + 1; total = acc.total + 1; }) { i = 0; total = 0; } (builtins.genList (x: x) 100000)'
error: stack overflow (possible infinite recursion)

foldl' forces the accumulator at each step, but only to weak head normal form. For an attrset, WHNF means "yes, this is an attrset." The values inside (i, total) stay as thunks. After 100,000 steps, each field is a chain of 100,000 deferred additions, the same pathology as the naive trampoline's non-key fields. On this machine, the C++ stack gives out around 65,000 deep. The trampoline with deepSeq handles the same workload without issue because it forces the entire state, not just its outer shape:

$ nix eval --expr '
  let
    trampoline = init: step:
      let
        steps = builtins.genericClosure {
          startSet = [{ key = 0; state = init; }];
          operator = s:
            let result = step s.state;
            in if result.done then []
               else [{
                 key = builtins.deepSeq result.state (s.key + 1);
                 state = result.state;
               }];
        };
      in (builtins.elemAt steps (builtins.length steps - 1)).state;
  in (trampoline { i = 0; total = 0; } (s:
      if s.i >= 100000
      then { done = true; state = s; }
      else { done = false; state = { i = s.i + 1; total = s.total + 1; }; })).total
'
100000

When your iteration count is known in advance and your state is a single value, foldl' is the right tool. The trampoline is for when you don't know the count, your state is compound, or both.

Where we use this

In nix-effects, this trampoline is the evaluation loop for a freer monad interpreter. A computation is a chain of algebraic effects: send "get" null, then send "put" 42, then more effects, possibly thousands deep. A naive recursive interpreter would call itself for each one, building stack proportional to chain length. With the trampoline, each effect is one genericClosure step: the operator calls the handler, which returns a resume value (feed to the continuation, keep going) or an abort value (discard the continuation, halt immediately). Continuations compose via an FTCQueue, a purely functional queue with O(1) snoc and amortized O(1) uncons, which eliminates the left-nesting pathology that makes naive free monads quadratic.

Here is the core of the real interpreter, stripped to essentials:

operator = step:
  if step._comp._tag == "Pure" then []
  else let
    eff = step._comp.effect;
    result = handlers.${eff.name} {
      param = eff.param;
      state = step._state;
    };
    newState = result.state;
  in [{
    key = builtins.deepSeq newState (step.key + 1);
    _comp = if result ? abort
            then { _tag = "Pure"; value = result.abort; }
            else queue.qApp step._comp.queue result.resume;
    _state = newState;
  }];

Look up the handler, call it, deepSeq the state, produce the next node. When _comp._tag is "Pure", return [] and halt. The full implementation in trampoline.nix is about 100 lines, most of it error messages and the handle combinator that wraps run with a return clause.

In the test suite, 100,000 chained operations pass. The bare trampoline handles 1,000,000 iterations in about a second (see the benchmark above); the full interpreter is slower due to handler dispatch and queue operations, but stack depth stays constant regardless. Left-nested bind chains of depth 1,000, the pathological case for naive free monads, resolve correctly because FTCQueue restructures them at each view. For a language with no loops, that's a reasonable interpreter.

You don't need freer monads or algebraic effects to use the trampoline pattern. Anything that loops fits the same shape.

When to reach for this

Most Nix code will never hit the stack limit. Don't add a trampoline to code that recurses 50 times. Reach for this when:

Computation length scales with input. Validating N fields, transforming N records, processing a list where N might exceed 10,000.
You're building an interpreter. Any recursive evaluator over user-supplied input will eventually exceed the limit.
foldl' doesn't fit. If you have a pre-built list and a known iteration count, foldl' is simpler with no dedup overhead. Use genericClosure when the operator decides when to stop.

For comparison, here's the same counter with foldl':

builtins.foldl' (acc: _: acc + 1) 0 (builtins.genList (x: x) 100000)

Simpler, roughly 11x faster at this scale (no key map overhead), and perfectly fine when you know the iteration count. The trampoline is for when you don't: parsing until you hit a delimiter, evaluating until a fixpoint, running a state machine until it halts.

Trade-offs worth knowing: genericClosure's std::map tracks seen keys at O(log n) per step. With unique monotonic keys, the check is effectively a sorted insert, but the map still grows linearly with step count. State must be data that deepSeq can fully evaluate. deepSeq recurses through attrsets and lists, but a function value is already in normal form. There's nothing inside a closure for deepSeq to force. If each step builds a new closure that wraps the previous one (say, { process = x: prev.process (x + 1); } where prev is last step's state), the chain of closure references grows with N. deepSeq sees a function, stops, and the chain survives. The trampoline runs fine; the blowup arrives when you call the accumulated function. A constant function carried unchanged across steps causes no problem at any N.

If builtins.trampoline lands, prefer it. ^[2] No dedup overhead, no key bookkeeping, clean semantics. As of February 2026, there's a proof-of-concept PR that saved roughly 100 thunks on pkgs.jq. Not a compelling enough result, which may explain why it hasn't been merged.

^[1] sternenseemann. "Tail Call Optimization in Nix Today." NixOS Discourse, February 2022. discourse.nixos.org/t/tail-call-optimization-in-nix-today/17763

^[2] roberth. "builtins.trampoline." NixOS/nix #8430, June 2023. POC: hsjobeki, NixOS/nix #14553, November 2025.