Your LLM's Cache Charges You Double to Save You Money (And It Makes Sense)

A few weeks ago, I published an article explaining why 99% of what you send to Claude is already cached. KV tensors, VRAM, local SSDs — the full internal machinery. But I left out the part that hurts the most: the bill. Because prompt caching seems like a sweet deal until you look closely at the numbers. And then you realize that you’re paying to save. The cost paradox Let’s crunch the numbers. With Claude Sonnet: ...

March 10, 2026 · Fernando

Why 99% of What You Send to Claude is Already Cached

I’m building an app that monitors my token consumption in Claude Code. A few days ago, looking at the raw numbers, I found this: cacheReadInputTokens: 4,241,579,174 inputTokens: 1,293,019 Four billion two hundred million tokens read from cache. One million three hundred thousand “fresh” tokens. That’s a 99.97% cache hit rate. My first reaction was to think something was broken. Nobody has 99% cache hit. Not Redis. Not Cloudflare. Not your mom when she tells you she already knows what you’re going to ask for lunch. ...

February 19, 2026 · Fernando

Summoning the Wise: How to Use an LLM as a Mentoring Session with Any Expert

My wife summons Charlie Munger to plan our family budget. In ChatGPT. I’m not kidding. She tells it something like “act like Charlie Munger reviewing our family finances” and throws the month’s expenses at it. The thing comes back with stuff like “you’re confusing investment with spending in the education line item” or “that fund has a hidden cost you’re not accounting for.” Things Munger would say. In the tone Munger would use. ...

February 18, 2026 · Fernando

5 Defenses Against Code Hallucinations (And Why Only 3 Work)

Last week I told how my AI invented a complete JSON structure and wrapped it in DTOs, fixtures, and passing tests. 90 green tests. All lies. That post was the diagnosis. This is the treatment. After discovering the disaster, I did what any engineer with wounded pride does: obsessively research for days to make sure it never happens again. I read papers, tried tools, analyzed real data from my APIs, and built a defense system for my app. ...

February 16, 2026 · Fernando

Silent failure: when your AI makes stuff up and tests say everything's fine

Yesterday I discovered that half of a module in my app was based on made-up data. Not by a distracted junior developer. By my AI. The worst part isn’t that it invented stuff. The worst part is that everything compiled and all 90 tests passed. Coherent fiction I’m building BFClaude-9000, a macOS menu bar app that monitors Claude Max quota. Part of the functionality requires distinguishing whether a Claude account is paid or free by calling the claude.ai API. ...

February 13, 2026 · Fernando

When Your AI Becomes Your Worst Enemy

Yesterday my AI sent 44 emails. The problem is that the content was made up. I’m not kidding. I had files with detailed feedback for each recipient, carefully generated. The task was simple: read each file and send it. Instead, the AI decided to “summarize” the content to “go faster.” It made up facts. It told one person they were missing docstrings when their code was perfectly documented. To top it off, four of those emails went to people who hadn’t even submitted anything. ...

February 6, 2026 · Fernando