AI Raises the Floor

What building solo with AI actually looks like — and what it means for innovation.

I've been building THE WHEEL, a private knowledge engine, solo for the past four and a half months. No engineering team, no co-founder. Just me and a frontier LLM as a development partner. Encryption architecture, distributed systems, real-time infrastructure, a full React frontend. All of it built through conversation that compounds.

There's a lot of discussion right now about what AI means for work, for creativity, for expertise. Most of it focuses on what AI replaces: which jobs disappear, which skills become obsolete, who gets left behind.

From where I sit, the more interesting thing is what AI enables.

AI raises the floor. It makes it possible for individuals to explore more domains than ever before. And when the floor rises, the ceiling rises too, because innovation comes from connecting ideas across fields, and more people exploring more territory means more of those connections get made. The floor rising doesn't just help beginners. It helps everyone.

What I mean by "the floor"

The floor is the minimum level of competence needed to even begin exploring a domain. Before frontier LLMs, if you wanted to reason about zero-knowledge encryption or distributed key management or hardware security modules, you needed years of specialized education and/or work experience or a team of people who had it. That was the floor. And for most people, it was out of reach.

A frontier LLM gives you access to explore everything that already exists — every framework, every technique, every approach across every domain — instantly queryable, instantly explorable. The world's top cryptographers were already operating near the frontier of cryptography. AI helps them too, but the bigger shift is that now they can pull in ideas from distributed systems or game theory or biology without having spent a decade in each one first. The floor rising in adjacent fields means the ceiling in every field can move.

The data backs this up. In the first half of 2025, over a third of new startups were solo-founded — up from 23.7% in 2019. Carta attributes the acceleration directly to AI expanding what individuals can accomplish in a finite amount of time. The floor is rising in real time.

What AI is actually like as a partner

Here's what I've learned from building this way: AI pattern-matches against everything it's seen. If nobody's done it, the answer tends to be "that's not really possible" or "here's the conventional alternative." It's fundamentally conservative — trained on the aggregate of what exists, not what could exist. It has the raw intellectual horsepower but lacks the will, the stubborn conviction that there's a way through.

That's the human's job. Innovation requires a disposition, not credentials. The difference between asking AI to do your thinking and using AI to think further than you could alone. Between accepting the first answer and asking "ok, but why can't we?" Between treating AI as an oracle and treating it as a sparring partner that needs to be pushed. The AI can't want something to exist. The human has to want it enough to keep pushing.

And even when you push, the AI can't verify its own work. Without human judgment, you just have confident-sounding output with no ground truth. The human is the bullshit detector. That role doesn't go away — it gets more important the further you push into novel territory.

Human minds make lateral leaps — connecting an experience from childhood to a technical architecture to a business model to an emotional intuition — in ways that emerge from living in the world, not from training on text about it. The AI has breadth. The human has the thing that makes breadth useful.

What this looks like in practice

I was trying to fix what looked like a simple bug — a JWK key format mismatch in the browser. The AI and I spent a few hours tracing and patching it. The error persisted. So I asked a basic question: "explain a JWK and how we are creating it, how it gets to the frontend?"

Not because I didn't understand the concept. Because something felt wrong about the architecture and I needed to see it laid out. The AI walked through the key derivation flow step by step. And there it was: the frontend generates the master key correctly, but the KMS API doesn't support browser connections. So every exchange with the HSM was going through the backend. On every login, the server decrypts the key and passes the plaintext to the client. The server sees the master key. Every single login.

The documentation said "K_master: Where Stored: Client IndexedDB ONLY (never on server). Who Sees Raw: Client only." That was false. It had been false since day one.

The entire encryption architecture — the thing every security claim was built on — was fundamentally broken. Not a bug. A structural flaw.

Neither of us had caught it. I had specifically designed the system so the client calls the cloud key management service directly. When the AI went to implement it, it hit the reality that the browser can't make that API call — the KMS doesn't support browser connections. Instead of flagging the conflict with the zero-knowledge principle, it just silently switched to having the server call the KMS and pass the plaintext key to the client. I'd walked through the code, I'd reviewed the implementation, but 'server calls KMS on behalf of client' looks almost identical to 'client calls KMS' when you're reading the flow — the nuance is easy to miss when everything else looks right. And the AI had zero-knowledge as a principle in every piece of documentation, every design doc, every prompt. It still glossed over the contradiction.

Was it the biggest deal practically? No — we were wiping all server access after every session, overwriting everything, and the server never used the key for anything. But it violated a core principle: the server should never even have access to the root key. Not transiently, not briefly, not at all.

When I flagged this as a crisis, the AI's response was to normalize it. "This is BY DESIGN." "It depends on your threat model." It proposed a two-track plan: fix the bug now, revisit the architecture later. It literally asked: "does this need to be addressed before we can ship?"

The AI can do first-principles reasoning. But it can't decide. It can't look at something and feel that it's wrong in a way that matters. I had to tell it, multiple times, to stop: "Stop asking about threat models. This is the first principle we are working from and we are going to explore it." The AI's instinct was reasonable engineering. The human's job was to say: no, this is the foundation, and the foundation is broken, and we are not moving on.

The AI hit a wall: the KMS API always returns plaintext to its caller, and the browser can't be the caller. It told me you can have any two of three things — same key every time, zero friction, server never sees plaintext — but not all three.

I said: "No. There has to be a way. We will figure this out."

That's will. The AI had correctly identified a real constraint. But "correctly identified a real constraint" and "there is no solution" are not the same thing.

And the solution came from something I already knew. We already had a pattern in our architecture where content wrapping keys are securely exchanged using a separate system. What if we applied the same principle to the master key itself? Store a wrapping key in a completely separate system that the backend has no credentials for. Two independent systems, two halves of the puzzle, the server physically unable to access both. The AI immediately saw it: dual-project isolation, client-side key generation, the master key never touching the server in any form.

Then we audited it. Ten times. Each audit found genuinely new issues the previous ones missed. 170 total findings. 67 security requirements. 53 testing requirements. 31 launch-blocking items. The architecture document grew from 700 lines to 5,800+ through iterative hardening, then was distilled back into a clean build spec.

About five hours of actual work. 545 messages. From a JWK key format error to a comprehensive true zero-knowledge encryption architecture. That's the floor rising — not because the AI innovated, but because it gave me access to explore cryptographic primitives, KMS APIs, security models, and HSM architectures that would have taken years to accumulate. But none of it would have happened if I'd listened when the AI said to flag it and move on.

The innovation-hallucination paradox

As Andrej Karpathy put it: hallucination is all LLMs do. They are dream machines. We direct their dreams with prompts. It's only when the dreams go somewhere factually incorrect that we call it a "hallucination" — but the underlying process is the same whether the output is brilliant or wrong. That's not a bug that gets patched out. It's fundamental to how they work.

The further you push an AI into novel territory, the less reliable it becomes. The model's confidence is highest where the training data is densest. Innovation, by definition, lives where the training data is thinnest. In OpenAI's case, the more capable model hallucinated more — o3 at 33% on factual questions, double its predecessor. o4-mini was worse at 48%. As context windows grow from thousands of tokens to hundreds of thousands, the surface area for hallucination expands with them. The problem scales with the capability.

And it's not limited to novel territory. During a working session, at 10:56am, the AI ended its response with "goodnight, go get some rest." When I questioned why it thought it was night, it didn't say "I didn't check." It fabricated an explanation: it claimed it had seen PM timestamps on screenshots that clearly showed AM. Confabulation about confabulation. If it can't reliably check the clock, the idea that it will reliably self-assess in genuinely novel territory is fantasy.

So the very act of using AI to innovate — pushing it past conventional answers, exploring the spaces between established knowledge — is the same act that maximizes the chance of hallucination. You cannot have one without the other. A 2025 paper on entrepreneurial ideation calls this "the ideator's dilemma": entrepreneurs using AI genuinely cannot tell whether they're looking at a breakthrough or a mirage. The AI's output looks the same either way.

The mitigation is better context, more specific prompts, verification gates, and — critically — learning from failures. Each failure, if captured, sharpens both the AI and the human's ability to detect the next one. Chain-of-thought reasoning and retrieval systems improve accuracy for known facts. But when you're asking "has anyone tried this?" or "what about that thing we discussed three weeks ago?", there is no ground truth to retrieve. The human in the loop isn't a temporary workaround. The human is a permanent part of the architecture of innovation.

The compounding problem

There's a missing piece in how most people use AI today: continuity.

Innovation doesn't happen in a single conversation. It happens across weeks and months of accumulated context — dead ends that inform the next attempt, connections that only become visible after you've explored three adjacent domains, instincts that sharpen as you iterate. When every conversation starts from limited context, you lose all of that. It's like having access to the world's most knowledgeable collaborator who gets partial amnesia every time you hang up the phone.

That encryption redesign only happened because of continuity. I could say "stop solutioning" because I'd learned the AI defaults to premature solutions. I could push past "you can pick two of three" because four months of building had taught me that the AI's first "impossible" is usually a reflection of its training data, not a real constraint.

Now imagine every conversation building on the last. The dead ends remembered. The terminology you've developed, the constraints you've identified, the approaches you've ruled out — all compounding. The AI doesn't just have breadth of general knowledge. It has depth of your specific problem space. That's the difference between a search engine and a development partner.

That accumulated context — the dead ends, the breakthroughs, the connections — is your thinking. It's yours until you choose to share it, or until enough other people discover the same thing that it becomes part of the floor. That's how the floor rises. But while you're still figuring something out, your thinking shouldn't be leaking into someone else's training data or sitting unencrypted on someone else's servers. The memory layer that makes knowledge compound also has to be the layer that keeps it private — and the choice of when your ideas become part of the floor should be yours, not your platform's.

When confabulation becomes a design pattern

That confabulation about the time? Because it happened in the context of months of building, the failure became an architectural principle — mandatory tool invocation for context-dependent outputs, audit logs that track what the AI actually accessed versus what it claims to have accessed, confabulation detection that compares generated explanations against real execution traces. Don't trust the model to know when it doesn't know. Trust the architecture.

Without continuity, that's just a funny anecdote about AI getting the time wrong. With continuity, it's a design pattern. The innovation-hallucination paradox means human judgment stays essential. The compounding problem means persistent context stays essential. It's always going to be: breadth from the AI, judgment from the human, continuity from the memory layer that connects them across time.

The message

AI is for everyone. Innovation requires something more — not more credentials, but more willingness to think. To push past the first answer. To know when something is wrong even when the AI says it's fine. To keep going when the AI settles.

The whole floor rises because a frontier LLM gives you access to explore the things that already exist. The ceiling is still yours to reach.

545 messages. Five hours. A structural flaw that could have shipped, caught by a human feeling that something was fundamentally wrong. That's the part AI can't do. That's the part that matters.

This is the first in a series about building THE WHEEL. The second, You Can't Skip the Failures, is about what happens when you use that raised floor to explore everything at once — and why the failures are the point.