Every AI system today forces the same choice: useful AI or private AI. If the AI can search your documents, it can read them. If it can read them, so can the people who run the servers. Most AI products resolve this by skipping encryption entirely — your data sits in plaintext behind access controls, and you’re asked to trust that nobody misuses it.
That tradeoff was the thing I most wanted to crack. Not because privacy is a nice feature. Because if you’re building a system where people store their thinking — their dead ends, their breakthroughs, the connections they’re still working out — the server not being able to read it isn’t optional. It’s the whole point. That’s why I’m building THE WHEEL — a private knowledge engine where you own your information. Disclosure is opt-in. Nothing gets shared, trained on, or surfaced without your explicit choice.
I believed it was possible to have both: real intelligence over truly private data. The answer required crossing four fields that don’t normally talk to each other — and a seventeen-year-old episode of Doctor Who.
The problem
Modern AI search doesn’t work like keyword matching. It works through vector embeddings — mathematical representations of meaning. When you write a note about “planning a surprise party for Mom’s 70th,” the AI converts that into a point in a high-dimensional space. When you later ask “what are we doing for Mom’s birthday?”, it converts your question into another point and finds the nearest neighbors. The question and the note end up close together because they mean similar things, even though they share almost no words.
This is powerful. It can also be a problem depending on how strictly you’re enforcing privacy. Not because of a bug. Because of the math.
Those embeddings encode semantic relationships. If two users in the same system both write about birthday parties, their embeddings sit near each other in the same mathematical space. A query from User A can land close enough to User B’s data to surface results across the privacy boundary. The search mechanism itself leaks information. Encrypting the data doesn’t solve this if the search still operates on shared mathematical relationships. And if you encrypt the embeddings themselves, the geometric relationships that make semantic search work are destroyed. You’ve protected the data and killed the intelligence.
When I dug into the research, the practical consensus was clear. Microsoft’s SEAL team, the leading implementation of homomorphic encryption for ML, has noted that practical performance for complex operations like nearest neighbor search remains an open problem. Researchers continue exploring homomorphic approaches, but even state-of-the-art systems like Compass report search latencies in the range of seconds per query — acceptable for some applications, but orders of magnitude too slow for interactive use. Yao et al. proved in 2013 that secure order-preserving encryption schemes cannot find exact encrypted nearest neighbors — a fundamental mathematical limit, not an implementation detail. The academic literature frames it as a trilemma: performance, privacy, intelligence — pick two.
Privacy plus intelligence means a separate encrypted database per tenant. Physically isolated, semantically searchable within each silo. It works in theory. But the economics don’t: separate database infrastructure per customer, separate maintenance, separate scaling. No major SaaS product ships this way because the cost structure doesn’t scale.
Performance plus intelligence is what every major AI product ships today. Shared infrastructure, plaintext data, access controls layered on top. Row-level security, role-based access. These are real tools and they work. But they’re policy enforcement, not mathematical guarantees. A database administrator can read everything. A misconfigured policy exposes everything. The threat model always includes the people who run the infrastructure — and that’s a threat model most companies quietly accept.
Performance plus privacy gives you secure storage with no intelligence. Encrypted, isolated, unsearchable. A vault, not a knowledge engine.
There’s a fourth path that people reasonably point to: run everything on-device. If the model lives on the user’s hardware and the data never leaves, the privacy problem disappears. And for some use cases that works. But you can’t run frontier-scale models on a phone or a laptop — not the models that are actually good at reasoning, synthesis, and connecting ideas across large bodies of knowledge. On-device gives you privacy, but it caps the intelligence at whatever fits on the hardware. The ceiling is real and it’s low.
I wasn’t willing to pick two. And I wasn’t willing to accept a lower ceiling.
Not a technical reader? The next four sections walk through cryptography, information retrieval, key management, and distributed systems — each one contributed a constraint that narrowed the solution space. If you want the answer without the journey, skip to “The connection” where the Doctor Who metaphor returns.
Field one: cryptography
The first approach I explored was searchable encryption — a field with decades of research. The standard technique uses deterministic tokens: hash each search term with a secret key, and the server can match hashed queries against hashed content without seeing the plaintext.
I built it. It worked in the narrow sense. But deterministic encryption has a fundamental flaw in multi-tenant systems: it preserves patterns. If two users search for the same term, both produce identical tokens. The server doesn’t know what the word is, but it knows both users searched for the same thing. Over time, it accumulates a frequency map. This is frequency analysis — one of the oldest attacks in cryptography.
This isn’t a novel observation. Naveed, Kamara, and Wright demonstrated in 2015 that frequency analysis could recover plaintext from deterministically encrypted databases. Lacharité and Paterson confirmed the vulnerability is fundamental to the approach, not an implementation detail. Any scheme that preserves patterns across users leaks information across users.
First constraint: isolation had to be mathematical, operating at a level deeper than encryption.
Field two: information retrieval
While exploring encryption, I was simultaneously building the AI chat system — standard retrieval-augmented generation using vector embeddings and cosine similarity.
During stabilization, a contamination event made the problem concrete. Test embeddings from one entity context got mixed with production embeddings from another. The similarity scores were nonsensical. The system returned results anyway. The AI incorporated the contaminated context into its responses and produced answers that sounded reasonable but were built on the wrong user’s information.
Plausibly wrong rather than obviously wrong. That’s the dangerous kind.
The bug was resolved quickly. But it confirmed what Song and Raghunathan’s 2020 work on information leakage in embedding models had already shown: embeddings leak membership, attribute, and content information in ways that traditional access controls don’t address. Recent work on semantic leakage from compressed embeddings demonstrates that even without exact reconstruction, embeddings can leak semantic information across privacy boundaries. If Entity A’s embeddings exist in the same coordinate system as Entity B’s, a query from Entity A can land close to Entity B’s vectors. Not because of an encryption failure. Because of geometry.
Second constraint: isolation had to happen at the level of the coordinate system itself. Shared space means shared risk.
Field three: key management
Building the encryption infrastructure required deep work on hardware security modules and key hierarchies. The architecture went through three complete iterations before reaching production quality.
The core principle: application code should never touch raw key material. Keys live in a hardware security module. The HSM processes requests and returns results. The application never holds, sees, or caches a root key. Deciding this was the core principle and implementing it was, as you might recall, a whole other story.
More importantly for the search problem, the work produced a key hierarchy with genuine per-entity isolation. Each user has a master key generated client-side — a random 256-bit key that never leaves the browser. Each entity gets its own entity key, generated independently. Members access the entity key through a wrapping protocol: the entity key is encrypted with each member’s master key, so the server stores wrapped blobs it can’t read, and each member unwraps the entity key client-side when they need it. From there, purpose-specific sub-keys are derived using HKDF per NIST SP 800-108r1 — one key for wrapping, another for metadata, each scoped to a single cryptographic purpose.
The result: every entity has a unique cryptographic identity, and no key is ever reused across purposes or shared in plaintext. I was building it for encryption. The connection to the search problem — that these same per-entity keys could drive mathematical isolation of embedding spaces — was forming but hadn’t crystallized.
Field four: distributed systems
THE WHEEL runs on serverless infrastructure. Containers spin up, handle requests, and get recycled. No persistent server. This forced a series of architectural decisions that turned out to be directly relevant.
A caching layer stored partially decrypted content in Redis for performance. Connection drop and recovery: cached decrypted data leaked into subsequent requests. Session keys cached across entity context switches: a user in one entity context would switch to another, and the system continued using the old entity’s cached key — correctly decrypted data from the wrong entity, displayed in the right entity’s interface. Document content encrypted, but filenames plaintext: “2025-tax-return-john-doe.pdf” leaks identity, document type, and temporal information without any decryption.
Each incident pointed to the same principle. Encryption isn’t a property of data at rest. It’s a property of the entire data lifecycle. Every surface where information exists is a surface where information can leak. As Rogaway argued in his 2004 work on nonce-based symmetric encryption, the security of a scheme must encompass all the ways its components interact with the environment. The security model has to be as ephemeral as the infrastructure: decrypt only what’s needed, only when authorized, hold it only in volatile memory, cryptographically erase it when the operation completes.
The connection
Four constraints defined the solution space. From cryptography: deterministic approaches leak patterns; isolation has to be mathematical. From information retrieval: shared embedding space is itself a privacy violation; isolation has to happen at the coordinate system level. From key management: every entity already has a unique, mathematically derived identity. From distributed systems: every operation has to be ephemeral, scoped, and erased.
The question was precise: how do you make embeddings from different entities coexist in the same database while being mathematically invisible to each other?
The answer came from an unexpected direction.
I’ve been rewatching Doctor Who with my kids. We’re deep in the David Tennant era, working through it episode by episode. (They’re obsessed. I’m not complaining.) In “The Stolen Earth,” the Daleks steal 27 planets and hide them in the Medusa Cascade. The Doctor can’t find them. They’re not behind a wall or in another dimension. They’re in the same place — shifted one second out of sync with the rest of the universe. “The perfect hiding place. Tiny little pocket of time.” The planets are right there, occupying the same space as everything else, completely invisible, because they’ve been offset in a dimension nobody thought to check.
I’d been carrying that image for weeks while working through the embedding isolation problem. And it mapped precisely onto what I needed.
Each entity’s embeddings could exist in the same vector database, shifted into a private orientation. Not encrypted in the traditional sense. Transformed so that the geometric relationships within one entity’s data are perfectly preserved — similar things stay similar, distant things stay distant — while the relationships between entities become mathematically meaningless. The transformation preserves cosine similarity exactly within an entity’s space — search quality is identical to operating on the original embeddings — while cross-entity comparisons return results statistically indistinguishable from random. Entity A’s embeddings and Entity B’s embeddings occupy the same database, shifted into orientations nobody else can see. Like planets one second out of sync. Right there. Invisible.
The mathematical technique comes from linear algebra — a class of transformations that maintain search accuracy within an entity’s data while making cross-entity comparison meaningless. The core idea has precedent: the ASPE scheme used matrix transformations to preserve inner products for searchable encryption, and recent work has explored similar constructions for other privacy problems. ASPE was subsequently broken — its vulnerability was that it used a shared matrix across users, which made it susceptible to reconstruction attacks given enough query-result pairs. The construction here avoids that: each entity’s matrix is generated independently at random using a cryptographically secure random source, so there is no shared structure for an attacker to exploit across tenants. The matrix is then encrypted under the entity’s key from the existing hierarchy and stored. The key doesn’t seed the matrix; it guards it. Apply the transformation before storage. Reverse it during authorized retrieval.
This solves three problems simultaneously.
First: the server can search without reading. When you submit a query, the server is granted temporary access to your entity’s transformation matrix, transforms the query, and access is immediately revoked. It searches the transformed embeddings in your space and returns correct results — without ever seeing what those results actually say. The embeddings are transformed, the content they point to is encrypted, and knowing the embedding doesn’t reveal the text.
Second: it cannot leak across tenants. Each entity has a different transformation matrix, generated independently at random and protected by a different key. The server can’t use your query to search someone else’s space because applying your transformation to their embeddings returns meaningless results. The transformations are mathematically incompatible.
Third: it’s not subject to frequency analysis. Unlike deterministic encryption where identical queries produce identical tokens, each entity’s transformation is different. The server can’t build a frequency map across users because the same semantic concept transforms differently in each entity’s space.
This isn’t encrypted computation. The embeddings are transformed once at write time, the query is transformed at search time, and the search itself is standard vector similarity — the same operation every RAG system already runs. The privacy comes from the transformation, not from computing over encrypted data. That’s why the performance cost is effectively zero: no homomorphic operations, no secure enclaves, no latency penalty. The Compass comparison above is architectural, not competitive — those systems compute over encrypted data; this one transforms data before storage and searches it normally.
If you’re interested in the mathematics behind this, I’d be happy to talk your ear off about it. Get in touch.
The server searches your data without being able to read it. It can’t search anyone else’s data for the same reason the Doctor couldn’t find Earth — it’s right there, in the same database, shifted into an orientation nobody else can see.
Zero-knowledge search, zero-trust everything else
There’s an important distinction here, and precision matters because the terms have specific technical meanings.
The search layer is genuinely zero-knowledge in a specific sense: the server performs semantic similarity search over transformed embeddings without accessing plaintext content. It finds relevant results without knowing what they’re about. This is the part that hadn’t shipped. One thing it can observe is structural geometry — cluster density, how many groups of similar documents exist, how spread out the space is. The transformation preserves angles exactly (that’s what makes search work), so it also preserves that structural shape. The server can’t recover what the clusters mean, but it can see that they exist. This is inherent to any distance-preserving approach and isn’t addressable without degrading search quality. It’s the honest accounting of what zero-knowledge search does and doesn’t mean.
But search is only half the problem. Once relevant results are found, you need to do something with them — summarize, answer questions, connect ideas. That requires a language model, and a language model needs to process actual content. At some point, if you want AI reasoning over your data, something has to see the data.
So the architecture has two layers. The search layer is zero-knowledge: when you ask “what did I write about Mom’s birthday?”, the server finds the relevant notes without reading them. The processing layer is zero-trust: when the AI summarizes those notes, the server decrypts them for that specific operation, processes them in volatile memory, and cryptographically erases them when done. No caching. No logging. No training. The server accesses content only during authorized processing, only for the scoped operation, and retains nothing afterward.
This isn’t a compromise. It’s the only honest architecture I’ve found. If someone has figured out full AI reasoning over data the server genuinely never sees, I’d love to read that paper. But the real question shouldn’t have been “does the server ever see the data?” It should have been “does the server get to keep it, share it, train on it, or access it without explicit authorization?” In this architecture, the answer is no.
Privacy as a right means something specific: opt-in, always. We don’t train on your data without permission. We don’t sell it. We don’t secretly use it. We’d rather have fewer users with full sovereignty than more users with compromised privacy. When THE WHEEL’s privacy policy goes live, you’ll be able to read exactly these commitments in plain language. The architecture enforces this, but the architecture exists because the conviction came first.
The unsolved part
The thing the research community had treated as an open problem — meaningful semantic search over encrypted data in a multi-tenant system — turned out to have an answer. It just wasn’t reachable from inside any single discipline. The cryptographers were right that you can’t search encrypted data with conventional encryption. The information retrieval researchers were right that semantic similarity requires geometric relationships. The distributed systems engineers were right that ephemeral processing is the only secure model. Everyone was right about the constraints of their own field. The solution lived in the space between them.
In the first post in this series, I wrote about what happens when AI raises the floor — when the baseline capability of a single person expands enough that they can operate credibly in domains that used to require years of specialization. This is what that looks like. Not expertise in four fields. Access to four fields, long enough to understand the constraints each one contributes, with enough stubbornness to keep holding them all at once until the connection appeared. The connection between a key hierarchy and a vector space through the right class of linear transformations isn’t one that existing literature had made. What changed is that one person could actually reach all four fields in a finite amount of time — with enough chutzpah to reject the trilemma.
The AI provided the raw material — the ability to explore cryptographic primitives, embedding mathematics, key derivation specifications, and serverless security patterns at a pace I couldn’t have reached alone. The connection itself came from holding the right constraints long enough and the right metaphor clicking: objects in the same space, invisible to each other, because they’ve been shifted in a dimension nobody thought to check.
The AI doesn’t watch Doctor Who with its kids and carry an image around for weeks until it maps onto a mathematical problem. That’s the human part. That’s what made this work.
The server searches your data without being able to read it. It can’t search anyone else’s data for the same reason the Doctor couldn’t find Earth — it’s right there, in the same database, shifted into an orientation nobody else can see. Twenty-seven planets, perfectly hidden. Millions of embeddings, mathematically isolated.
The research community treated it as an open problem. It turned out the solution required crossing four fields that don’t normally talk to each other — and carrying an image from a seventeen-year-old TV show around long enough for it to map onto the math.