Introducing kli — kleisli blog

I open Claude Code in the morning and it knows nothing. Not the architecture I explained yesterday, not the convention I established last week, not that two other sessions are running right now, editing the same files.

So I paste context. Re-explain decisions. Watch agents step on each other's work.

Dozens of "memory for AI" tools save text to a vector database and retrieve it by cosine similarity. That's a key-value store with extra steps. It doesn't answer the questions I actually have: Which sessions are editing auth.lisp right now? Did the session that crashed at 2am leave work half-finished? Which of my saved patterns helped, and which were noise?

The problem isn't remembering. It's structure.

I ran into this building lie-stormer.no — the website for Norway's first national mathematics research center, an ERCOM member. The stakes were real and the infrastructure had to work. Common Lisp, NixOS, EUR 20/month on a Hetzner VPS. In October 2025, nineteen days after a paper on agentic context engineering ¹, I adopted its methodology: the agent writes observation files after every phase — research, planning, implementation, reflection. Each task a directory. Each observation a markdown file.

Before the methodology: 104 commits across two repos in four months. After: 257 commits in November alone. 230 tasks. 117 handoff documents. I know the exact dates because the system recorded them.

At 230 tasks, the file-based system hit its limits. No causal ordering across concurrent sessions. No programmatic queries — finding tasks that mentioned a file meant grep across hundreds of markdown files. Two agents could edit the same task without knowing about each other. The methodology worked. The infrastructure underneath it didn't.

kli is what I built to replace that infrastructure. One binary. Hooks watch your sessions. Two MCP servers expose 31 tools. Everything goes into append-only event logs — plain JSONL files, under version control, next to your code. No database, no external service, no telemetry.

Every code block on this page is live — hover over one and click evaluate to run it on this server's Common Lisp image. Definitions persist across blocks.

State computed from events

kli stores nothing in a database. Each task is a JSONL file. State is what you get when you replay the events:

;; An event log is a list of things that happened.
;; State = replay the history.

(defun replay (events)
  "Fold events into state. No database."
  (let ((obs nil) (status "pending") (edges nil))
    (dolist (ev events)
      (case (first ev)
        (:observe (push (second ev) obs))
        (:status  (setf status (second ev)))
        (:link    (push (second ev) edges))))
    (list :status status
          :observations (nreverse obs)
          :edges (nreverse edges))))

;; Five events go in. State comes out.
(replay '((:observe "Auth module uses JWT with RS256 signing")
          (:observe "Rate limiter is in middleware/rate-limit.ts")
          (:status  "active")
          (:link    "implement-rate-limiting")
          (:observe "Found existing refresh logic in auth/refresh.ts")))

The real apply-task-event in lib/task/state.lisp handles 17 event types. The task-state struct has 11 fields, each a conflict-free replicated data type ². But the principle is the same: create empty state, iterate, apply each event. A fold.

This means task data lives in ace/tasks/ as plain files. git log shows your task history. grep finds observations. Every mutation carries a timestamp and session ID. No schema migrations, no external service to run. The trade-off: no SQL queries, no joins across tasks. So we built TQ and PQ — purpose-built query languages that Claude uses via MCP tools to query the task and pattern graphs.

And because events are append-only, two agents writing to the same task don't conflict. They append to the same file.

Concurrent agents merge without locks

Append-only isn't enough, though. When two sessions edit the same task, their events need to merge — not just concatenate. If one agent sets the status to "active" and another sets it to "blocked," you need a deterministic answer regardless of merge order.

CRDTs solve this ². A Last-Writer-Wins register always converges to the same value:

;; Two agents write to the same register at the same time.
;; Whoever wrote later wins. Merge order doesn't matter.

(defstruct lww-reg value timestamp)

(defun lww-merge (a b)
  "Merge two registers. Highest timestamp wins."
  (if (> (lww-reg-timestamp a) (lww-reg-timestamp b)) a b))

;; Agent A writes "investigating" at time 1000
;; Agent B writes "fix-confirmed" at time 1042
(let ((agent-a (make-lww-reg :value "investigating" :timestamp 1000))
      (agent-b (make-lww-reg :value "fix-confirmed" :timestamp 1042)))
  (list
    :merge-a-then-b (lww-reg-value (lww-merge agent-a agent-b))
    :merge-b-then-a (lww-reg-value (lww-merge agent-b agent-a))))

Same result both ways. No coordination protocol, no locks, no conflict resolution UI.

kli uses six CRDT types: G-Sets for observations (grow-only — you never delete an observation), OR-Sets for edges (add and remove with unique tags), LWW-Registers for status and claims, plus LWW-Maps, PN-Counters, and vector clocks ³ for causal ordering. Two agents edit the same task, every write from both survives the merge.

Coordination through traces

Most agent coordination systems use messaging — requests, responses, shared state through a broker. kli uses stigmergy instead.

Stigmergy is how termite colonies build without a foreman ⁴. Each termite deposits material and responds to what others deposited. No communication between individuals. Structure emerges from traces left in the environment.

In kli, every Edit or Write tool call is a trace deposit. When a new session starts, it reads those traces:

;; Every tool call leaves a trace.
;; Agents find each other by reading the environment.

(defvar *traces* nil)

(defun deposit-trace (session file action)
  "Leave a trace — like a pheromone trail."
  (push (list :session session :file file :action action
              :time (get-internal-real-time))
        *traces*)
  (format nil "~A: ~A ~A" session action file))

(defun who-touched (file &optional exclude-session)
  "Who else modified this file?"
  (remove-if
   (lambda (tr) (equal (getf tr :session) exclude-session))
   (remove-if-not
    (lambda (tr) (equal (getf tr :file) file))
    *traces*)))

;; Session A works on auth
(deposit-trace "session-A" "src/auth.lisp" :edit)
(deposit-trace "session-A" "src/middleware.lisp" :edit)

;; Session B is about to edit auth.lisp — check first
(let ((activity (who-touched "src/auth.lisp" "session-B")))
  (if activity
      (format nil "Warning: ~A edited this file"
              (getf (first activity) :session))
      "No conflicts — safe to edit"))

kli builds on this with session fingerprints. Each session gets a behavioral vector — tools used, files touched, observation embeddings, graph proximity. Sessions are classified as builders or observers. Two builders on the same files trigger a conflict warning. An observer gets visibility without noise.

Beyond conflicts, kli uses traces for orphan pickup. When a session crashes, the next session that bootstraps the same task discovers the abandoned phases and claims them. And find-missing-edges watches for sessions that keep jumping between two unlinked tasks — if sessions repeatedly transition between A and B with no connecting edge, kli suggests one.

What this looks like in practice

kli init configures Claude Code with 31 MCP tools across two servers and 6 lifecycle hooks. Then it stays out of your way.

The task server (28 tools) handles task creation, observations, and DAG construction with typed edges. Claude queries the task graph through TQ, a pipeline language — (-> (active) :enrich (:sort :obs-count) (:take 5)) returns the five most-observed active tasks. The playbook server (3 tools) retrieves patterns via spreading activation over a co-application graph, records feedback, and evolves patterns over time. Pattern effectiveness is helpful - harmful. Patterns that hurt get demoted.

The hooks run without configuration: session start, session leave, tool call tracking, task context writes, file conflict detection, pattern activation recording. You don't invoke them. They watch.

If you want structure, kli ships workflows: /kli:research for codebase exploration with observation capture, /kli:plan for iterative planning with phase decomposition, /kli:implement for TDD with verification gates, /kli:reflect for pattern extraction. Or skip them — hooks and MCP servers do their work either way.

Without kli	With kli
Context lost between sessions	Event log persists observations, handoffs, plans
Same mistakes repeated	Patterns carry helpful/harmful scores, surface automatically
Parallel sessions overwrite files	File conflict detection via behavioral traces
Flat task lists	DAG with typed edges, queryable via TQ pipelines
Manual context loading every session	`task_bootstrap` loads full graph context in one call

This page built itself

The post you're reading was developed using kli — not as a demonstration, but because we use kli for everything.

That's the actual task graph. Four sessions, 27 observations, 11 handoffs over two days. The first session fixed SSE encoding bugs and cleaned up CSS. The second built the interactive features — the REPL you've been using, the embed system, a CSRF fix for the HTMX runtime. The third and fourth rewrote this text until the voice stopped sounding like a press release.

Each code block you evaluated ran on this server's Common Lisp image via WebSocket. Your definitions from earlier — replay, lww-merge, deposit-trace — are still live in the image. That's image-based development: compile one function and it's live in seconds. No deploy pipeline, no container rebuild. The trade-off is a smaller ecosystem and fewer contributors.

This blog runs on lol-reactive, a Common Lisp web framework from the same monorepo. Server-rendered HTML, HTMX for interactivity, SSE for live updates. One SBCL image handles MCP servers, hook dispatch, dashboard, and this blog. No Python, no Node, no Docker. The trade-off is a larger binary. We picked zero runtime dependencies over small download size.

The task graph that planned this launch has 6 phases: brand library, blog service, docs service, launch post, documentation, deployment. Three are done. The patterns that guided the writing were surfaced by kli's playbook — the same system that helps Claude Code remember your project conventions.

Same structure as the 230 tasks on lie-stormer.no. File-based observations became event logs. Grep across markdown became TQ queries. The methodology is unbroken from October 2025 to this paragraph, and every number in this post was collected by kli or its predecessor.

kli is open source under MIT. Code at github.com/kleisli-io/kli.

curl -fsSL https://kli.kleisli.io/install | sh
kli init

We're a small team in Tromsø, Norway. If you've solved agent coordination differently, or if you think CRDTs are overkill for this, we'd like to hear from you.

References

¹ "Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models." Stanford, SambaNova Systems, UC Berkeley. arXiv:2510.04618, October 2025.

² M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski. "Conflict-Free Replicated Data Types." SSS 2011, LNCS 6976, pp. 386–400. Springer, 2011.

³ C. J. Fidge. "Timestamps in Message-Passing Systems That Preserve the Partial Ordering." Proceedings of the 11th Australian Computer Science Conference, 1988. See also F. Mattern, "Virtual Time and Global States of Distributed Systems," 1988.

⁴ P.-P. Grassé. "La reconstruction du nid et les coordinations interindividuelles chez Bellicositermes natalensis et Cubitermes sp." Insectes Sociaux, vol. 6, pp. 41–80, 1959.