frr.dev

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
title: "My AI Read a JSON File from Disk 900 Times in a Loop (And Why No Linter Can Save You)"
date: 2026-02-24T14:00:00+01:00
draft: false
slug: "llm-read-json-900-times-loop-performance"
description: "An LLM generated code that read and parsed a JSON file from disk during each iteration of a 900-iteration loop. It's a rookie mistake. No linter will catch it."
tags: ["ai", "llm", "performance", "swift", "tokamak", "adversarial"]
categories: ["opinion"]

translation:
  hash: ""
  last_translated: ""
  notes: |
    - "de primero de carrera": means "first-year student level" / "beginner mistake". Don't translate literally as "first of career".
    - "enseño a mis alumnos al mes de empezar": means "I teach my students within the first month". Refers to how basic the concept is.
    - "marear la perdiz": means "to beat around the bush" / "go around in circles". Hunting metaphor.
    - "chapuza": means "hack/bodge/kludge". Not derogatory per se, just a quick-and-dirty solution.
    - "burrada": means "something egregiously wrong/stupid". Stronger than "mistake", weaker than "atrocity".
    - "barra del bar": "bar counter" — refers to casual conversation setting, not a literal instruction.
    - "ojo al dato": "here's the key point" / "pay attention to this".
    - "dicho en cristiano": "in plain language". No religious connotation intended.
---

Last week, my AI wrote code that read a JSON file from disk, parsed it, performed **one** lookup, then repeated this 900 times within a `for` loop. Each iteration: open file, decode JSON, retrieve a value, discard everything. Then start over.

This is the kind of mistake I teach my students not to make within their first month of programming.

## What happened (no beating around the bush)

I’m building Tokamak, a macOS menu bar app to monitor Claude Max quotas. Part of the functionality scans ~900 JSONL files from Claude Code sessions. For each file, it needs to know the *byte offset* where it left off last time (incremental reading—only read whatever is new).

The offsets are stored in a JSON file:

```json
{
  "version": 1,
  "offsets": {
    "project-a/session-1.jsonl": 48231,
    "project-b/session-2.jsonl": 12044
  }
}

A Dictionary<String, UInt64>. 900 entries. ~55KB. Nothing groundbreaking.

And here’s the kicker that makes this even more absurd: that file was created by the app itself. It’s not a JSON file from an external API. It doesn’t come from Claude Code. It’s an internal state file that Tokamak writes and reads to keep track of where it left off in each session. The AI was reading a file it had generated itself from disk 900 times.

“But why don’t you use Core Data or SQLite? They’re already in the app.” Good question. Because this file is a disposable progress cache. If it gets corrupted, you delete it, and the next scan reconstructs all offsets by reading all the files in full once. Zero data loss. Plus, I can just cat session-offsets.json | jq . to debug it (with Core Data, I’d need sqlite3 and the sandboxed database path). It’s Sendable without messing around with background contexts. And if Core Data’s SQLite gets corrupted, it doesn’t drag the offsets down with it (or vice versa). For 55KB of a flat dictionary, the ceremony of setting up an entity with schema migration just isn’t worth it.

The format wasn’t the issue. The access was.

Here’s the code the AI generated for the scan loop:

1
2
3
4
5
6
7
8
9
for file in files {  // 900 files
    let storedOffset = offsetStore.offset(for: file.relativePath)
    // ↑ THIS reads the JSON file from disk and parses it. Every. Single. Time.

    if file.fileSize == storedOffset { continue }
    // ... read file, update offset ...
    offsetStore.setOffset(newOffset, for: file.relativePath)
    // ↑ AND THIS reads it AGAIN, modifies it, and writes it back.
}

Two disk IO operations per iteration. 900 iterations. 1,800 IO operations when there should be exactly two: one read at the start, one write at the end.

The numbers (xctrace doesn’t lie)

I caught it with Instruments (Time Profiler). Here’s the data:

Metric	Before	After
Total samples	7,260	489
Samples in `OffsetStore.load()`	1,704 (88%)	10 (2%)
Scan time	>20s	<0.5s
CPU	81%	~1.5%

Eighty-eight percent of the scan time was spent reading and parsing a 900-line JSON file. Over and over again. Like Sisyphus pushing the same rock, but with JSONDecoder.

The fix (which should embarrass you)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// BEFORE: IO on every iteration
for file in files {
    let offset = offsetStore.offset(for: file.relativePath) // reads JSON
    // ...
    offsetStore.setOffset(newOffset, for: file.relativePath) // reads + writes JSON
}

// AFTER: load once, operate in memory, save once
var offsets = offsetStore.load()  // ONE read
for file in files {
    let offset = offsets.offsets[file.relativePath] ?? 0  // O(1) in memory
    // ...
    offsets.offsets[file.relativePath] = newOffset
}
offsetStore.save(offsets)  // ONE write

Here’s the key point: the data structure didn’t change. It was still a Dictionary<String, UInt64>. The hash table was already optimal. What wasn’t optimal was rebuilding it from disk in every iteration.

What doesn’t work: adding “don’t do this” to your CLAUDE.md

After the fix, I added this to the project’s CLAUDE.md:

“NEVER perform IO (disk, network, JSON decoding, Core Data fetch) inside a loop if it can be done beforehand. Load data once before the loop, operate in memory, save once after.”

And here’s the real takeaway: it didn’t work.

Weeks later, while adding a second service (Codex), the AI generated exactly the same pattern. With the instruction right in front of it. It’s like putting up a “Don’t walk on the grass” sign and expecting it to work.

Why? Because the LLM doesn’t understand the rule. It has seen the rule. Statistically, most of the code it trained on performs IO sporadically, not inside 900-iteration loops. The load → use → save pattern in a function is more common. Whether that function is called inside a loop of 900 iterations is a contextual detail the model has no incentive to track.

What doesn’t work either: linters

There’s no linter that catches this. Not SwiftLint, ESLint, Ruff, or Clippy. Think about it: the code is syntactically correct and semantically valid. Each individual call to offsetStore.offset(for:) is perfectly reasonable. The problem isn’t in any single line—it’s in the composition.

If we think in terms of code layers of meaning (an idea I teach in my adversarial development course):

Layer	Question	Fails here?
1. Signal	Is this code?	No
2. Language	Is it valid Swift?	No
3. Syntax	Does it compile?	No
4. Local semantics	Does the function do what it promises?	No
5. System semantics	Does it respect contracts and performance?	Yes
6. Architecture	Does it scale without degrading?	Yes

The failure is at layers 5-6. Exactly where LLMs fail today, in 2026. The syntax and local logic are spotless. The problem is emergent: it arises when a correct function is used in a context that turns it into a bottleneck.

A linter operates at layers 2-4. It has no visibility into composition or performance. Asking a linter to catch this would be like expecting Microsoft Word’s spell checker to catch a logical fallacy.

What does work: performance tests after the fact

After the first fix, I wrote this test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
@Test("Scan performance doesn’t degrade with number of files")
func scanPerformanceDoesNotDegradeWithFileCount() async throws {
    // Create 1,000 minimal JSONL files
    for i in 0..<1000 {
        let content = "..." // some valid line
        try content.write(to: dir.appendingPathComponent("session-\(i).jsonl"), ...)
    }
    // Prepopulate offset store (simulate re-scan)
    var offsets = SessionOffsetStore.OffsetData()
    for i in 0..<1000 {
        offsets.offsets["session-\(i).jsonl"] = 100
    }
    offsetStore.save(offsets)

    let start = ContinuousClock.now
    await service.scan()
    let elapsed = ContinuousClock.now - start

    #expect(elapsed < .seconds(3))  // <3s for 1000 files
}

It’s a brutally simple regression test. A thousand files, less than 3 seconds, or the test fails. If anyone (human or AI) puts IO back into the loop, the scan time jumps from 0.2 seconds to 30, and the test explodes.

And that’s exactly what happened. When the AI generated the second service with the same bug, the performance test for the first service still passed (it was a different service). But when I wrote the equivalent test for the new service, it immediately failed. The test did its job: catch the regression that neither the CLAUDE.md nor any linter could spot.

What this confirms

This bug is a textbook example of the core thesis of what I call adversarial development: never trust, always verify.

You can’t trust the AI not to make beginner-level mistakes. It will. Repeatedly. Even if you tell it not to.

You can’t trust linters to catch it. They can’t. The mistake is above their abstraction level.

What you can do:

Performance tests as a safety net after the fact
Real profiling (xctrace, Instruments) to measure, not guess
Defense in depth: multiple layers, because no single layer covers everything

In plain language: the defense isn’t a wall. It’s an onion. Layer upon layer. And when one layer fails, the next catches it.

For the skeptics

“But Fernando, wouldn’t a human programmer make the same mistake?”

A junior, yes. A senior, probably not—they have the pattern internalized. Even if they did make the mistake, code reviews would catch it. The problem with AI-generated code is volume: 50 files in 10 minutes. Nobody reviews 50 files line by line. Discriminator fatigue is real.

And that’s why your verification needs to be automatic, not human. A performance test doesn’t get tired. It doesn’t get distracted. It doesn’t have fatigue. It runs every time you hit make test and tells you if something smells off.

This is the same principle I follow in The 5 Defenses Against Hallucinations in Code: the verification system must be external to the generator. If the AI writes the code, verification has to come from somewhere else. In this case, from a clock measuring how long something takes.

The numbers (xctrace doesn’t lie)#

The fix (which should embarrass you)#

What doesn’t work: adding “don’t do this” to your CLAUDE.md#

What doesn’t work either: linters#

What does work: performance tests after the fact#

What this confirms#

For the skeptics#