1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
title: "33,000 Lines of XML to Tell You heavyWork() Is Slow: How I Tamed xctrace for LLMs"
date: 2026-03-08T14:00:00+01:00
draft: false
slug: "ztrace-xctrace-compact-summary-llm"
description: "xctrace exports 33,000 lines of XML that overwhelm any LLM's context window. ztrace condenses it into 10 actionable lines. Here's how and why."
tags: ["xctrace", "instruments", "profiling", "llm", "claude-code", "python", "performance"]
categories: ["opinion"]

translation:
  hash: ""
  last_translated: ""
  notes: |
    - "dicho en cristiano": "in plain language". No religious connotation.
    - "domesticar": used metaphorically as "to tame" (a tool/output). Not literal.
    - "chapuza": "hack/bodge/kludge". Quick-and-dirty solution, not derogatory.
    - "paja": means "filler/fluff/noise" in this context. Do NOT translate as the vulgar meaning.
    - "ojo al dato": "here's the key point" / "pay attention to this".
---

Last week, I was profiling a Swift app using *Instruments*. Nothing unusual: `xctrace record`, `xctrace export`, copy the XML into Claude Code's context, ask it to find the hotspots.

And Claude says: "The XML is too large; I can't process it reliably."

33,553 lines of XML. For a program with two functions.

## The Real Problem

`xctrace export` is an excellent tool. It gives you **everything**: every *sample*, every *backtrace*, every frame with its binary, memory address, and UUID. It's exhaustive, precise, and complete.

And that's exactly the problem.

When profiling an app to find bottlenecks, I don’t need all 3,044 individual *samples*. I don’t need to know that *sample* number 1,847 caught the CPU at address `0x1027ec9a8` in `libswiftCore.dylib` at 00:02.847.882. I need to know that `heavyWork()` takes 70% of the time and `lightWork()` takes 30%.

In plain language: I need **ten lines**, not thirty-three thousand.

## Why XML Is the Right Format (but the Noise Isn't)

Before anyone says, "The problem is using XML in 2026": that's not it.

XML is the perfect format for what xctrace does. Think about it:

- **Hierarchical**: A *backtrace* is a tree of frames. A *sample* contains a *backtrace*, a *thread*, a *process*. XML naturally models this.
- **Self-descriptive**: Every element has a name, typed attributes, and a validatable structure. You don’t have to guess what the 7th field in a CSV line represents.
- **Elegant deduplication**: xctrace uses an `id`/`ref` system where it defines a frame the first time (`id="59" name="heavyWork()"`) and then references it with `ref="59"`. It’s essentially a serialized *flyweight pattern*.
- **Processable with standard tools**: XPath, `xmllint`, `xml.etree.ElementTree`... no need for a proprietary parser.

The XML from xctrace is not *bloat*. It's structured information that *Instruments* needs to reconstruct interactive call trees, compare *runs*, and filter by thread or process. It's designed for a GUI tool that can expand and collapse nodes.

The problem arises when you try to feed that information into an LLM's context window. It’s like trying to read the entire text of *Don Quixote* just to find the windmills reference. The information is there, but the signal-to-noise ratio is brutal.

## The Solution: ztrace

So, I built `ztrace`. A Python script that takes a `.trace` bundle and produces a compact summary.

Here’s the idea:

1. Run `xctrace export --toc` to get metadata (process, duration, template)
2. Run `xctrace export --xpath` to extract the `time-profile` table
3. Parse the XML, resolving the `id`/`ref` system
4. Filter system frames (anything living in `/usr/lib/` or `/System/`)
5. Aggregate by function and generate the summary

Pay attention: Step 3 is more important than it seems. xctrace doesn’t repeat the full definition of a frame every time it appears in a *backtrace.* It defines it once with `id="59"` and then uses `ref="59"`. If you don’t resolve the *refs*, you lose most of the information.

## The Result

With my test fixture (a trivial program with `heavyWork()` at ~70% and `lightWork()` at ~30%):

$ ztrace summary sample.trace

Process: hotspot Duration: 3.8s Template: Time Profiler Samples: 3044 Total CPU: 3044ms

SELF TIME 69.4% 2113ms hotspot heavyWork() 29.7% 905ms hotspot lightWork()

TOTAL TIME (callers with significant overhead) 99.9% 3041ms main

CALL STACKS 69.4% 2113ms main > heavyWork() 29.7% 904ms main > lightWork()


From 33,553 lines to 13. All the information an LLM needs to tell you, "Optimize `heavyWork()`, it's taking 70% of the CPU," fits in a tweet.

## What ztrace Filters (and Why)

Not everything reported by xctrace is actionable. When profiling an app, I can’t optimize `libdispatch.dylib`. I can’t rewrite `dyld4::PrebuiltLoader::loadDependents`. Those frames are noise when I’m hunting for *hotspots* in **my** code.

ztrace filters at multiple levels:

**System binaries**: Anything in `/usr/lib/` or `/System/` gets discarded. These are frames from the OS and Swift runtime.

**Runtime internals**: Functions like `__swift_instantiateConcreteTypeFromMangledNameV2` or `DYLD-STUB$$sin` technically live in your binary (statically linked), but they’re not your code. Out they go.

**Unresolved symbols**: Production apps (like Spotify, for instance) are often *stripped*. Frames will show up as raw addresses like `0x104885404`. ztrace filters those and notifies you: "85% of user samples are unsymbolicated." This way, you know the profile contains data but needs *dSYMs* to be useful.

## Testing with Real Apps

The fixture is convenient but artificial. Does it work with a real app? I tested it with Ghostty (the terminal emulator):

Process: ghostty Duration: 3.8s Template: Time Profiler Samples: 295 Total CPU: 295ms

SELF TIME 53.2% 157ms ghostty main 3.7% 11ms ghostty renderer.metal.RenderPass.begin 3.1% 9ms ghostty renderer.generic.Renderer(renderer.Metal).rebuildCells 2.7% 8ms ghostty renderer.generic.Renderer(renderer.Metal).drawFrame 2.4% 7ms ghostty renderer.generic.Renderer(renderer.Metal).updateFrame 2.0% 6ms ghostty heap.PageAllocator.alloc 1.7% 5ms ghostty terminal.page.Page.clonePartialRowFrom 1.7% 5ms ghostty font.shaper.coretext.Shaper.shape


Now *this* is actionable. You can immediately see: the Metal renderer (render pass, rebuild cells, draw frame) and the font shaping are where the time is going. If you were optimizing Ghostty, you'd know exactly where to start.

And each function comes with its module (`ghostty`), so in an app with multiple frameworks, you'd know whether the bottleneck is in your code or a dependency.

## The Stack (and Why Not Swift)

The original `CLAUDE.md` said Swift. "Makes sense for the use case," I thought. After seeing that 95% of the work is parsing XML and formatting text, I switched to Python.

`xml.etree.ElementTree` parses XML in three lines. In Swift, `XMLParser` is pure SAX—callbacks, mutable state, delegates. A hack for something that should be "give me the tree and let me navigate."

Plus: a Python script can be distributed with `uv tool install`. A Swift binary only works on macOS/arm64. And since xctrace only exists on macOS, "cross-platform" isn’t a compelling reason to use Swift here. But distributing with `uv` is infinitely cleaner than compiling and copying binaries.

## What’s Next

This is v0.1. What's missing:

- **`ztrace record`**: Record and summarize in a single command (convenience, not urgent)
- **Configurable filters**: Exclude specific modules, adjust call stack depth
- **Trace comparison**: Before/after optimization, in diff format
- **Support for Allocations**: Not just CPU, also memory

The repo is on [GitHub](https://github.com/frr149/ztrace) if you want to try it out.

## Integrating It Into Daily Workflow

The beauty of ztrace isn’t running it manually. It’s having Claude Code use it automatically when profiling.

You add this to your `CLAUDE.md` (global or project-specific):

```markdown
### Profiling (xctrace)

- Use `ztrace summary <file.trace>` to read traces. NEVER read the raw XML from xctrace export.
- Workflow: `xctrace record` → `ztrace summary`
- Flags: `--threshold 0.5` (more functions), `--depth 10` (deeper stacks)

From there, every time Claude Code needs to profile something, the workflow is:

1
2
3
4
5
6
7
# 1. Record
xctrace record --template 'Time Profiler' --time-limit 5s --launch -- .build/debug/MyApp

# 2. Summarize (10 lines that fit in the context)
ztrace summary MyApp.trace

# 3. Claude reads the summary and suggests optimizations

Without ztrace, step 2 would generate 30,000 lines of XML that either blow up your context window or drown the signal in noise. With ztrace, Claude has exactly what it needs to tell you “70% of the CPU is in heavyWork(), line 42 of Renderer.swift.”

The Meta Point

ztrace exists because LLMs are bad at processing raw, large-scale data. They’re good at reasoning about processed, compacted information. Giving Claude 33,000 lines of XML is like handing a doctor a raw DICOM MRI file and asking for a diagnosis. The doctor needs the rendered image, not the raw bytes.

The next time an LLM tells you “the output is too big,” the solution isn’t a model with more context. It’s a better summary. A pipeline that turns raw data into actionable insights before it reaches the model.

Because at the end of the day, that’s what we engineers do: turn noise into signal. With or without AI in the mix.