Skip to main content

Command Palette

Search for a command to run...

The Real Moat Isn't Software

Everyone's fighting over chat interfaces. The real problem is in the physical world.

Updated
7 min read
The Real Moat Isn't Software
J
Researcher and builder. Background across AI, defense, biotech, and hardware. Writing about persistent memory systems and cognition. joshadler.com

I spent three months building the software layer of an AI memory system. Retrieval pipelines, encoding gates, vector search, temporal decay. Real work, real results, published a paper on it. And then I realized none of it mattered as much as a $15 Raspberry Pi with a camera taped to a shelf.

That sounds like I'm being dramatic. I'm not.

Your AI Knows Almost Nothing About You

Every piece of information your AI has about you came through a text box, you typed it during a conversation you chose to have about something you remembered to bring up. Your AI doesn't know you pace when you're anxious, that you bought a standing desk and used it twice, that your actual sleep schedule looks nothing like the one you'd report if someone asked.

Most of the patterns that define you are invisible to you because they're automatic, you never mention them so your AI never learns them. And the stuff people do share is filtered through self-perception, which is wildly unreliable, people say they work out four times a week when they go twice, they describe themselves as morning people while consistently opening their laptops at noon.

The models are smart enough, they've been smart enough for a while now, the input layer is what's broken and almost nobody is working on fixing it.

Why Software Moats Don't Last

Every AI memory company is racing to build better context management for chat interfaces, better RAG, smarter reranking, novel encoding strategies, real innovations but they're all just code. Someone reads your paper, understands the approach, ships their own version in a weekend. I've watched it happen with my own work.

Hardware can't be copied like that. Physical deployment, sensor calibration, months of debugging driver conflicts and thermal issues, you can't replicate that from a GitHub repo, you have to build it and survive the thousand small failures that come with putting electronics on walls.

Five Nodes, $500, Months of Pain

I built a five-node sensor network in my apartment called Paradox. Each node is a Raspberry Pi Zero 2W with an ArduCam IMX708 12MP wide-angle camera (120-degree FOV) and a WM8960 audio HAT, about $100 per node, $500 total. Custom Python daemon handles motion-triggered and audio-triggered recording at 1280x720 at 15fps, with a low-res 320x240 stream for motion detection, everything ships to a NAS, inference runs on an RTX 5090.

That sounds clean. The reality was months of garbage.

I started with OwlSight 64MP sensors because the specs looked incredible, and they ran so hot the Pi would thermal throttle within twenty minutes. The driver needed a custom dtoverlay with a specific link-frequency parameter that I spent entire nights debugging, staring at dmesg output at 2am, trying to figure out why a camera initialized on one node but failed on another with the identical SD card image. The answer was always something dumb... a loose ribbon cable, a kernel version mismatch, a power supply that couldn't hold.

Ripped all five out, switched to the IMX708. Less impressive on paper, dramatically more stable. The hardware lesson software engineers never learn: optimize for "does it work at 3am when nobody's watching," not the spec sheet.

What One Week of Data Showed Me

Within a week the system had captured patterns I never would have typed into a chat window, movement patterns through the apartment, actual sleep schedule versus what I'd report, how long I really sit at my desk compared to how long I think I do. One hour of physical observation generates more behavioral data than a year of chat transcripts, and the gap between self-reported behavior and observed behavior is enormous, and that gap is exactly where AI understanding falls apart.

When your AI knows you've been averaging 45 minutes less sleep this week, not because you logged it but because it watched the lights go off and come on, that's not smarter reasoning, that's reasoning with actual information instead of whatever you remembered to type into a prompt.

The three-layer stack

The way I think about it, there are three layers to AI that actually works for a specific person. Observation at the bottom, the cameras and microphones and sensors that capture the physical world, that's Paradox. Memory in the middle, encoding all that data intelligently and deciding what matters and letting the rest decay, that's what I built TrueMemory to solve (modeled on biological memory, encoding gates, salience scoring, temporal decay, full architecture in my arXiv paper). And reasoning at the top, the LLM itself, Claude or GPT or whatever ships next.

Nearly all investment flows into that top layer and the reasoning is getting incredible, but it's reasoning on top of almost nothing because the bottom two layers barely exist. It's like building the world's most powerful engine and putting it in a car with no windows.

The Honest Constraints

I'm not going to pretend this is solved.

You can't put a camera in your car, you can't put one in your office without getting fired, you can't wear Meta glasses to dinner without everyone at the table feeling weird about it. The spaces where the most meaningful behavioral data lives are exactly the spaces where cameras are socially unacceptable, legally complicated, or both.

The Pi Zero 2W draws 1.5W idle but spikes to nearly 4W under camera load, so battery operation is off the table. Five cameras at 15fps generate a genuinely stupid amount of data, and I spent a week building cleanup pipelines just to keep storage from overflowing.

And honestly the hardest constraint isn't technical. My girlfriend didn't speak to me for two days after I installed the cameras, and we worked it out, there are zones now, rooms where cameras don't run. But social acceptability is a constraint as hard as any thermal limit, you can't debug your way out of it. Humane's AI Pin flopped partly because nobody wants a camera pointed at them during conversation... the observation layer needs to work socially, not just technically, and right now it doesn't.

When the Layers Connect

When all three layers actually work together, your AI notices you haven't left your desk in six hours, cross-references your calendar showing back-to-back meetings, remembers that this exact pattern preceded you getting sick last month, and says something about it before you even realize what's happening. Not because you asked, not because you typed anything, but because it was in the room with you and it remembered what happened last time and it connected dots you didn't even know existed.

That's not a chatbot, that's something completely different from what anyone is actually building right now.

Nobody wins by building a better chat interface, the chat interface is temporary, an artifact of the fact that we haven't figured out how to get AI into the room with you.

I don't have this figured out. Five cameras, a girlfriend who tolerates them with conditions, a NAS that fills up too fast. But the moat is who gets AI into the physical world first, who figures out observation that's accurate enough to be useful and unobtrusive enough to be acceptable and intelligent enough to know what to keep. That's a hardware problem, a social problem, and a memory problem all tangled together.

That's why I'm debugging dtoverlay configurations instead of writing another wrapper. The hard part was never the software.


Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.