My kids have a habit that drives me insane. I ask what cereal they want, and they say they don’t care. I pick one. They tell me I chose wrong.
Here’s what makes it worse: I actually don’t care either. I have no opinion on the cereal. I never think about the cereal. I don’t eat the cereal. I shop at Costco, so we have enough Cheerios and Raisin Bran and Special K to outlast a minor apocalypse. They all exist in this house, they can have any of them, it literally does not matter. And I’m not eating their cereal anyway. I have opinions about my cereal. That’s a separate conversation.
The reason I asked is precisely because I wanted to not have to think about it. But now that they’ve opted out, I have to think through what they probably want, factor in what we have, weigh what they complained about last time, and make a call on behalf of a preference I was never consulted about, don’t share, and that doesn’t affect me in any way. The burden didn’t go away. It just moved to the person least equipped to carry it.
I did the exact same thing building THE WHEEL.
The category I didn’t know I was avoiding
Not consciously. I genuinely thought there was a whole category of decisions I didn’t have opinions about. Font weight. Error state copy. The color of a divider line. The shape of an empty screen. The right word for a button that saves something. These were decisions that needed opinions. I just never had to be the one to have them. Someone always cared more. Someone always had a stronger instinct, a reference, a reason. The opinion got made without it ever needing to be mine.
And if you know me, you are laughing right now, because I am not someone who has historically struggled to have opinions. I have very specific opinions about the correct consistency of salad dressing and I will not be taking questions. It turns out I just hadn’t been forced to have opinions about this particular category yet.
When you’re building alone with AI, that’s over.
The AI will always give you something. That’s the thing people don’t quite understand about it. It doesn’t leave blanks. It makes choices, confidently and completely, with no indication that the choice it made was one of many it could have made. Then you’re standing there with a finished thing in your hands, and you have to decide: keep it or don’t.
What it gives you is not an opinion. It’s what an opinion looks like. It will argue equally confidently for the glowing orb or the paper plane, and if you push back it’ll argue for a hybrid, and if you push back on that it’ll argue for something else. Whatever preferences it inherited from training, it has no stake in your version of the answer. That’s actually useful. You get fast, articulate options from something that has no territorial feelings about its first suggestion. But the having of the opinion still has to come from somewhere, and that somewhere is you.
Keeping it is a choice. You’re responsible for what you ship. “The AI did it” is not a defense you get to use, especially not with yourself.
The library was the best and worst example
The library is the place where every document, note, chat, and file you’ve ever added lives. It’s one of the most-used parts of the app, and it needed a redesign because the original version was fine in the way that things are fine when no one has really made a decision about them. It just sort of accumulated. So I started over.
The process was thorough. I defined user types and gave each one to an agent with a specific prompt: be the consultant managing fifteen clients who needs folders to stay sane; be the privacy-minded researcher who finds a “database” view suspicious because it feels like everything is floating in a cloud somewhere; be the developer who immediately asks where dark mode is and then has to be convinced to engage with the light-mode-only constraint I’d set. A dozen agents in total, each with a defined persona, each asked to evaluate four distinct design directions and explain their reasoning.
The design directions were serious. Not sketches. Fully realized, interactive mockups, built and tested in a day:
That’s the thing about running structured evaluations with AI agents: you get real output fast. A defined question, twelve distinct perspectives, a clear winner, a detailed brief, done. The winning design was built, fully realized, incorporating all the feedback. Folders for the people who needed folders. Density for the people who needed density. Keyboard shortcuts for the people who would never use anything that didn’t have them.
I looked at it and hated it.
Not in a vague way. Not in a “something feels off” way that might be solvable with tweaks. I hated it specifically and immediately. It checked every box and I still hated it. The boxes weren’t the requirements. I just hadn’t known what the actual requirements were until I was looking at the thing that violated them. What emerged from trying to be everything to everyone was something I wouldn’t want to use. And I’m building this for people like me.
So I ripped it out. The problem wasn’t that the hybrid design was ugly. It was that it had no default shape. It tried to satisfy the folder people and the tag people and the database people all at once, which meant there was no obvious starting point. A new user would land and have no idea what they were supposed to do first. For a beta, that’s fatal. You lose people before they’ve seen anything.
What I went back to was closer to the original, but made it quietly extensible. The default view is simple: a flat list, filter chips across the top, nothing to configure. Folders are really just tags with a different shape now, so they can exist if you want them. You can use them, organize around them, ignore them entirely. They just don’t announce themselves to people who don’t need them. The complexity is there. It’s just not in your face.
Twice more, and not for design
The first time was a small one. I was picking a model. I ran the same set of evaluations across six problem domains, twice, on a holiday weekend, with the laptop on the kitchen counter and the kids watching a movie in the next room. There was a clear winner on every metric I cared about. Faster. Fewer errors. Cheaper at the volumes I was projecting. The decision was also reversible. If the new model turned out worse in production I could swap back inside an hour. I still spent two days deciding. The numbers don’t make the call. They tell you what you’re trading and they ask you whether you want to trade it. I did. But I had to be the one to say so.
The second time was harder.
I’d been building a piece of infrastructure that changes how the AI responds across the entire product. The whole point of it is that the AI stops improvising. Instead of asking the model a question and hoping the answer comes back in a shape your code can read, you give it a strict template and the model fills it in. Less guessing, fewer surprises, more things that just work. I had it working in staging. Once it was working I could see what it actually wanted to be. I tightened the templates. I added the fallbacks. I made the error paths specific instead of generic. I tested it until I was bored of testing it.
The model swap had been a config change. This one was a posture change. Once it was on, the entire product was committed to a different way of asking the model questions, and unwinding it meant undoing the work in a way that would leave me without the testing harness I’d built around it.
And then I sat at my desk on a Tuesday morning with the deploy command in a terminal window, my finger hovering over the return key, and I did not press it. I closed the terminal. I made coffee. I came back two hours later and did it then. The system is designed to take the guessing out of how the AI responds. It cannot take the guessing out of whether to turn it on.
Smaller and faster
The empty states were the same exercise, just smaller and faster.
Every section of THE WHEEL needs an empty state. The screen a user sees before they’ve added anything. Chat, notes, documents, reminders, search, library. Each one needed an illustration. The first round for chat alone was ten options. Abstract metaphors: ripples, constellations, a sunrise over a horizon. Literal ones: scattered papers, a window frame, a folded origami triangle. A glowing orb in the style of Notion, a cartoon face like Slack, a geometric stack like Stripe. I looked at all of them and immediately knew that most were wrong. Not wrong in some abstract aesthetic sense. Wrong for this. Wrong in a way I could feel before I could explain.
I wasn’t being indecisive about it. I was learning what I thought. Each round made visible a preference I couldn’t have stated before I built the thing that violated it. The orb was too generic. The face was too cute. The stack was too corporate. None of those were criteria I would have written down beforehand. They were criteria that existed only as the negations of things I’d looked at and rejected.
Most of them eventually landed:
Most of those landed in two or three rounds. Documents became four colorful floppy disks at jaunty angles, because the whole point of documents is that they’re things you keep, and a floppy is the platonic image of keeping something. Notes became scattered post-its in the colors people actually use. Chat became a speech bubble with a single blinking cursor. The reminders one was the holdout. It went through several rounds, got selected as final, got literally marked FINAL in the file, then got reconsidered. At some point mid-process I said: this is too blue. It needs to look like a mall You Are Here sign. That became a direction. A dark directory board, a cream-colored grid, an amber star at the center. It was selected. It was marked final again. And then it got revisited again after that.
The mall sign wasn’t the answer because I started with wayfinding as a reference. I got there by saying what was wrong with what I had, and the answer emerged from the negation. “Too blue” wasn’t a color note. It was “this is trying too hard to look like software and should look more like a thing.” I didn’t know that’s what I meant until I said it out loud.
It does get easier
Not in the sense that the answers get obvious. In the sense that I’ve learned what my own taste sounds like when something violates it. I have a bias toward density. I tolerate complexity in infrastructure and hate it in interfaces. I know what “too blue” means now. I know what I mean when I say something feels corporate, or clever, or scared. The vocabulary builds up. The negation gets faster. The first round of options stops being ten and starts being three, because I can rule out most of the obvious wrong answers before anyone draws them.
That’s the part that gets easier. Not the deciding. Just the recognizing. The cost of having to be the one who decides is the same as it ever was. I just spend less time staring at the wrong answers before I know they’re wrong.
Eventually I will outsource some of it to a person. When there’s someone with better design sense in the room, they should be making these calls, and they will make them better than I do. Until then, I’m it. The only honest thing to do is build the calls so the next person can swap them out without inheriting my taste as law. Most of what’s in THE WHEEL right now is shaped that way: opinionated defaults, easy to replace.
You can generate the options. You can run the agents. You can let the AI write four versions of the same paragraph and pick the one that feels least wrong. All of that is useful, and you should do it. But every one of those steps ends in the same place: a moment where a human decides. At some point you’re standing in the kitchen at 7am holding a box of Cheerios and a box of Raisin Bran, and the kid who said they didn’t care is going to tell you that you picked wrong. The only way out is to have actually wanted one of them.