| |
$ ztrace summary sample.trace
Process: hotspot Duration: 3.8s Template: Time Profiler Samples: 3044 Total CPU: 3044ms
SELF TIME 69.4% 2113ms hotspot heavyWork() 29.7% 905ms hotspot lightWork()
TOTAL TIME (callers with significant overhead) 99.9% 3041ms main
CALL STACKS 69.4% 2113ms main > heavyWork() 29.7% 904ms main > lightWork()
From 33,553 lines to 13. All the information an LLM needs to tell you, "Optimize `heavyWork()`, it's taking 70% of the CPU," fits in a tweet.
## What ztrace Filters (and Why)
Not everything reported by xctrace is actionable. When profiling an app, I can’t optimize `libdispatch.dylib`. I can’t rewrite `dyld4::PrebuiltLoader::loadDependents`. Those frames are noise when I’m hunting for *hotspots* in **my** code.
ztrace filters at multiple levels:
**System binaries**: Anything in `/usr/lib/` or `/System/` gets discarded. These are frames from the OS and Swift runtime.
**Runtime internals**: Functions like `__swift_instantiateConcreteTypeFromMangledNameV2` or `DYLD-STUB$$sin` technically live in your binary (statically linked), but they’re not your code. Out they go.
**Unresolved symbols**: Production apps (like Spotify, for instance) are often *stripped*. Frames will show up as raw addresses like `0x104885404`. ztrace filters those and notifies you: "85% of user samples are unsymbolicated." This way, you know the profile contains data but needs *dSYMs* to be useful.
## Testing with Real Apps
The fixture is convenient but artificial. Does it work with a real app? I tested it with Ghostty (the terminal emulator):
Process: ghostty Duration: 3.8s Template: Time Profiler Samples: 295 Total CPU: 295ms
SELF TIME 53.2% 157ms ghostty main 3.7% 11ms ghostty renderer.metal.RenderPass.begin 3.1% 9ms ghostty renderer.generic.Renderer(renderer.Metal).rebuildCells 2.7% 8ms ghostty renderer.generic.Renderer(renderer.Metal).drawFrame 2.4% 7ms ghostty renderer.generic.Renderer(renderer.Metal).updateFrame 2.0% 6ms ghostty heap.PageAllocator.alloc 1.7% 5ms ghostty terminal.page.Page.clonePartialRowFrom 1.7% 5ms ghostty font.shaper.coretext.Shaper.shape
Now *this* is actionable. You can immediately see: the Metal renderer (render pass, rebuild cells, draw frame) and the font shaping are where the time is going. If you were optimizing Ghostty, you'd know exactly where to start.
And each function comes with its module (`ghostty`), so in an app with multiple frameworks, you'd know whether the bottleneck is in your code or a dependency.
## The Stack (and Why Not Swift)
The original `CLAUDE.md` said Swift. "Makes sense for the use case," I thought. After seeing that 95% of the work is parsing XML and formatting text, I switched to Python.
`xml.etree.ElementTree` parses XML in three lines. In Swift, `XMLParser` is pure SAX—callbacks, mutable state, delegates. A hack for something that should be "give me the tree and let me navigate."
Plus: a Python script can be distributed with `uv tool install`. A Swift binary only works on macOS/arm64. And since xctrace only exists on macOS, "cross-platform" isn’t a compelling reason to use Swift here. But distributing with `uv` is infinitely cleaner than compiling and copying binaries.
## What’s Next
This is v0.1. What's missing:
- **`ztrace record`**: Record and summarize in a single command (convenience, not urgent)
- **Configurable filters**: Exclude specific modules, adjust call stack depth
- **Trace comparison**: Before/after optimization, in diff format
- **Support for Allocations**: Not just CPU, also memory
The repo is on [GitHub](https://github.com/frr149/ztrace) if you want to try it out.
## Integrating It Into Daily Workflow
The beauty of ztrace isn’t running it manually. It’s having Claude Code use it automatically when profiling.
You add this to your `CLAUDE.md` (global or project-specific):
```markdown
### Profiling (xctrace)
- Use `ztrace summary <file.trace>` to read traces. NEVER read the raw XML from xctrace export.
- Workflow: `xctrace record` → `ztrace summary`
- Flags: `--threshold 0.5` (more functions), `--depth 10` (deeper stacks)
From there, every time Claude Code needs to profile something, the workflow is:
| |
Without ztrace, step 2 would generate 30,000 lines of XML that either blow up your context window or drown the signal in noise. With ztrace, Claude has exactly what it needs to tell you “70% of the CPU is in heavyWork(), line 42 of Renderer.swift.”
The Meta Point
ztrace exists because LLMs are bad at processing raw, large-scale data. They’re good at reasoning about processed, compacted information. Giving Claude 33,000 lines of XML is like handing a doctor a raw DICOM MRI file and asking for a diagnosis. The doctor needs the rendered image, not the raw bytes.
The next time an LLM tells you “the output is too big,” the solution isn’t a model with more context. It’s a better summary. A pipeline that turns raw data into actionable insights before it reaches the model.
Because at the end of the day, that’s what we engineers do: turn noise into signal. With or without AI in the mix.