Memory That Collaborates
When two teams need to combine data, the usual answer is infrastructure. An ETL pipeline, an API, a message bus. The kind of setup that works—until it doesn’t, or until the next maintenance window shows up.
MISRYOUM’s tech desk has been looking closely at an approach that cuts a lot of that overhead. The basic pitch is simple: if your database is an immutable value stored somewhere, anyone who can read the storage can query it. No server to run, no API to negotiate, no data to copy. And if the query language supports multiple inputs, you can join databases from different teams in one expression.
This is how Datahike works, and the more interesting part is that it’s not framed as a bolt-on feature. It falls out of two properties the architecture treats as fundamental. First, databases are values. In a traditional setup, you query through a running server, and the data may change between queries. The database is effectively a service, not something you “hold.” In Datahike, dereference a connection (@conn) and you get an immutable database value—a snapshot frozen at a specific transaction. It won’t change. Pass it to a function, store it in a variable, hand it to another thread.
Two concurrent readers, holding the same snapshot, always agree. No locks. No coordination dance. The idea traces back to Rich Hickey’s work with Datomic in 2012, where perception (reads) is treated as values and doesn’t require coordination. In Datomic, indices live in storage, but the transactor keeps an in-memory overlay of recent index segments that haven’t been flushed yet—so readers often need to coordinate to get a complete, current view. Datahike removes that dependency because the writer flushes to storage on every transaction. Storage becomes authoritative. Any process that can read the store sees the full, current database—no overlay, no transactor connection needed.
To make that claim feel real, the storage structure matters. Datahike keeps indices in a persistent sorted set—described as a B-tree variant where nodes are immutable. Every node is stored as a key-value pair in konserve, which abstracts over storage backends: S3, filesystem, JDBC, IndexedDB. When a transaction adds data, it doesn’t modify existing nodes. It creates new nodes for the changed path from leaf to root, while the unchanged subtrees get shared with the previous version. This is structural sharing, the same technique behind Clojure’s persistent vectors and Git’s object store.
There’s even a concrete way the “cost” behaves: imagine a B-tree with thousands of nodes supporting a million datoms. A transaction that adds ten datoms might rewrite a dozen nodes along affected paths—while the thousands of untouched nodes are reused. Both old and new snapshots remain valid, complete trees. They just share most of their structure. The crucial property: every node is written once and never modified, meaning the node’s key can be content-addressed. That, in turn, lets nodes be cached aggressively, replicated independently, and read by any process that can access storage without coordinating with the process that wrote them.
Then there’s the practical part—how a reader actually gets the snapshot. When you call @conn, Datahike fetches one key from the konserve store: the branch head (for example, :db). This returns a small map containing root pointers for each index, schema metadata, and the current transaction ID. Nothing else is loaded immediately; the database value is a lazy handle into the tree. As a query traverses the index, each node is fetched on demand from storage and cached locally in an LRU. After that, subsequent queries hitting the same nodes don’t pay extra I/O. The indices “live in storage,” so any process that can read storage can load the branch head, traverse the tree, and run queries—no server process, no connection protocol, no port to expose. It’s called the distributed index space, and it means two processes reading the same database fetch the same immutable nodes independently.
All of that sets up the part teams will care about most: joining across databases without forcing them into the same system. Because databases are values and Datalog natively supports multiple input sources, you can join databases from different teams, different storage backends, or different points in time in a single query. One team’s product catalog might sit on S3 while another maintains inventory in a separate bucket. A third team can join them without either team doing anything—each @ dereference fetches a branch head from its respective bucket and returns an immutable database value, and the query engine joins locally with no server coordinating between them and no data copied. Also, you can mix snapshots from different points in time: the old snapshot and the current one are both just values, so the query engine doesn’t care when they were taken. (A little personal detail: I noticed this in a demo log because the room suddenly got quiet—the little keyboard clicks stopped, like everyone was waiting for the join result to settle.)
So far “storage” has meant S3 or filesystem, but konserve also has an IndexedDB backend, which means the same model works in a browser. With Kabel WebSocket sync and konserve-sync, a browser client replicates a database locally into IndexedDB. Queries run against the local replica with zero network round-trips, and updates sync differentially—only changed tree nodes transmitted. The structural sharing that makes snapshots cheap on the server makes sync cheap over the wire too.
The last detail MISRYOUM editorial desk kept coming back to is that you can replace :memory with :s3, :file, or :jdbc and the same code works across storage backends. Databases don’t have to share a backend—an S3 database can be joined against a local file store in the same query. And once you start thinking about that—values, snapshots, joins without coordination—it’s hard to go back to the old pattern entirely. Or maybe that’s just me, mid-thought, still impressed that the whole thing is basically stored memory that collaborates.