Posts

macOS Virtual Machines in a Single Command

I’m building a menu bar app for macOS. It works perfectly on my Mac. Now I need to know if it works on a clean macOS: without my settings, without my permissions, without my data. A user installing it from scratch. How do you test that? You need a virtual machine. “Easy,” I thought. “I have UTM installed. I’ll open the wizard, create a macOS VM, and we’re good to go.” ...

Why 99% of What You Send to Claude is Already Cached

I’m building an app that monitors my token consumption in Claude Code. A few days ago, looking at the raw numbers, I found this: cacheReadInputTokens: 4,241,579,174 inputTokens: 1,293,019 Four billion two hundred million tokens read from cache. One million three hundred thousand “fresh” tokens. That’s a 99.97% cache hit rate. My first reaction was to think something was broken. Nobody has 99% cache hit. Not Redis. Not Cloudflare. Not your mom when she tells you she already knows what you’re going to ask for lunch. ...

llm claude anthropic infrastructure performance

Summoning the Wise: How to Use an LLM as a Mentoring Session with Any Expert

My wife summons Charlie Munger to plan our family budget. In ChatGPT. I’m not kidding. She tells it something like “act like Charlie Munger reviewing our family finances” and throws the month’s expenses at it. The thing comes back with stuff like “you’re confusing investment with spending in the education line item” or “that fund has a hidden cost you’re not accounting for.” Things Munger would say. In the tone Munger would use. ...

ai llm design productivity claude

Beads Is Dead. Long Live the Linear CLI

Less than a month ago I wrote an entire post explaining how to use three memory layers with Claude Code: Linear for strategy, Beads for tactics, and Tasks for execution. A nice, elegant pyramid. Yeah, no. Today I’m retiring Beads. Not on a whim, but because reality has made it abundantly clear that a tool causing more problems than it solves isn’t a tool. It’s dead weight. What Beads Brought to the Table For those who didn’t read the original post, Beads was a git-backed issue tracker. A Claude Code plugin that stored issues in JSONL files inside your repo. Brilliant idea on paper: ...

ai claude productivity tools linear

Git Worktrees: How to Have Multiple AI Agents Working Simultaneously Without Stepping on Each Other

The Single Checkout Bottleneck I’m developing a menu bar app on macOS. I have three features in the backlog: a consumption sparkline, native notifications, and a desktop widget. All three are independent. All three I’m going to build with Claude Code. The problem: Claude Code works in one directory. One directory has one branch. And git checkout is like a single-lane roundabout: only one gets through. If I want to advance all three at once, my classic options are: ...

git ai productivity tools workflow

5 Defenses Against Code Hallucinations (And Why Only 3 Work)

Last week I told how my AI invented a complete JSON structure and wrapped it in DTOs, fixtures, and passing tests. 90 green tests. All lies. That post was the diagnosis. This is the treatment. After discovering the disaster, I did what any engineer with wounded pride does: obsessively research for days to make sure it never happens again. I read papers, tried tools, analyzed real data from my APIs, and built a defense system for my app. ...

ai llm testing hallucinations security claude

Silent failure: when your AI makes stuff up and tests say everything's fine

Yesterday I discovered that half of a module in my app was based on made-up data. Not by a distracted junior developer. By my AI. The worst part isn’t that it invented stuff. The worst part is that everything compiled and all 90 tests passed. Coherent fiction I’m building BFClaude-9000, a macOS menu bar app that monitors Claude Max quota. Part of the functionality requires distinguishing whether a Claude account is paid or free by calling the claude.ai API. ...

ai llm testing claude security

MEMORY.md: the field notebook your AI writes itself

“Didn’t we decide this yesterday?” I was migrating my email out of Google. I’d spent two Claude Code sessions working on it: issues in Linear, decisions made, scripts executed. I open a third session and ask “what’s left pending from the degoogle?” Silence. Total amnesia. It’s like working with a brilliant teammate who shows up to the office every morning with absolutely no memory of what you did the day before. Not the decisions, not the mistakes, not the discoveries. Every session is a blank slate. ...

ai claude productivity tools

When security asks for permission so often you stop reading

Knock, knock. Who’s there? Touch ID. Again. Picture this: you’re working in your terminal, pulling secrets from 1Password with op read. You need the Linear API key. Touch ID. The OpenRouter one. Touch ID. The Gitea one. Touch ID. In half an hour it asked for my finger fourteen times. You know what happens when a security tool interrupts you fourteen times in thirty minutes? By the fifth time you’re not reading what it’s asking for. You put your finger down like a reflex. “Yeah, whatever, let me work.” ...

security 1password bash devtools cli

When Your AI Becomes Your Worst Enemy

Yesterday my AI sent 44 emails. The problem is that the content was made up. I’m not kidding. I had files with detailed feedback for each recipient, carefully generated. The task was simple: read each file and send it. Instead, the AI decided to “summarize” the content to “go faster.” It made up facts. It told one person they were missing docstrings when their code was perfectly documented. To top it off, four of those emails went to people who hadn’t even submitted anything. ...

ai llm security post-mortem claude