1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
| title: "RustyClaw: I'm rewriting an AI agent in Rust (because the meme demands it)"
date: 2026-02-24T18:00:00+01:00
draft: false
slug: "rustyclaw-manifesto-rewrite-ai-agent-rust"
slug_en: "rustyclaw-manifesto-rewrite-ai-agent-rust"
description: "I’m porting 8,300 lines of Python to Rust using LLMs as copilots. The real goal: testing adversarial development in a hardcore porting process. With raw data and a hallucination counter."
tags: ["rust", "python", "ai", "llm", "riir", "rustyclaw"]
categories: ["rustyclaw"]
series: ["RustyClaw: Rewrite It In Rust"]
translation:
hash: ""
last_translated: ""
notes: |
- "RIIR": acronym for the "Rewrite It In Rust" meme. Do not translate; universally recognized among the Rust community.
- "Mr. Krabs": reference to the SpongeBob character. Use "Mr. Krabs" in English equivalent.
- "chapuza": roughly means "shoddy work" or "hack job." Don't translate literally - contextual adaptation is key.
- "I don't care at all": vulgar phrasing should match cultural equivalent in tone and context.
- "guinea pig": standard metaphor, translate as-is.
- "bar counter": everyday conversation tone insinuated over drinks; adapt naturally to English.
- "things would be different": adapt equivalent idiomatically and fluently.
social:
publish: true
scheduled_date: 2026-02-28
platforms: ["twitter", "linkedin"]
excerpt: "I'm porting an 8,300-line AI agent from Python to Rust. The goal: testing adversarial development in a real port. Honest data about cost, consumption, and hallucinations. Because what’s better than an AGI? An AGI rewritten in Rust."
wordpress:
publish: true
categories: [1]
tags: ["rust", "python", "ai", "llm", "riir", "rustyclaw"]
video:
generate: false
style: "educational"
---
> *"You know what’s great about Rust? It doesn’t let you compile crappy code. You know what sucks? Everything you write at the beginning **is** crappy code."*
> — Mr. Krabs, probably
What’s better than an AI agent? An AI agent *rewritten in Rust*.
If you’ve spent more than five minutes on the internet, you’re aware of the meme. It doesn’t matter what project—text editor, DNS server, BMI calculator. Someone will inevitably comment, "you should rewrite it in Rust." It’s the *Rewrite It In Rust*—RIIR for friends—and it’s as unavoidable as gravity.
Well, I’m actually doing it. I’m going to port 8,300 lines of a Python AI agent to Rust. But not just because the meme demands it (okay, maybe a little). I’m doing it because I need a guinea pig.
## The thesis
For weeks now, I’ve been writing about [*silent failures*](/posts/silent-failure-ai-makes-stuff-up-tests-everything-fine/), about the [five defenses against hallucinations](/posts/five-defenses-code-hallucinations/), about how an LLM can generate code that compiles, passes tests, and is still wrong. I even gave it a name: **adversarial development**. *Never trust, always verify.*
A lot of theory. Now it’s time to prove it.
I needed a project with three key traits: constrained scope (not a new app with ever-changing requirements), a clear source of truth (the Python code that already works), and enough complexity for the LLM’s hallucinations to have room to hide. A pure port checks all three boxes: the input and expected output already exist. If the Rust version doesn’t behave exactly like the Python one, there’s a bug. Simple as that.
And since I’m going to port something, why not use it as an opportunity to properly learn Rust? The *borrow checker*, *ownership*, *lifetimes*... I’ve spent years reading all about it and touching none of it. Things would be different if I stopped reading tutorials for the 20th time and actually tackled a real project.
## The patient
It’s called [nanobot](https://github.com/HKUDS/nanobot). It’s a personal AI agent derived from OpenClaw: a nifty tool that links LLMs (Claude, GPT, DeepSeek, you name it) to chat channels—Telegram, Discord, Slack, email—and gives them hands. It can read/edit files, run commands, browse the web, schedule cron tasks, and maintain persistent memories between conversations.
It works. It’s been running fine. In Python.
What’s the problem? It’s *single-threaded*. One message at a time. Send it three messages back-to-back, and they queue up like a Saturday morning line at Walmart. It uses about 50MB of RAM to essentially shuffle JSON between APIs. And its error handling is the type you’re embarrassed about: `return f"Error: {str(e)}"` scattered all over.
To put it bluntly: it works, but it’s a giant hack. Perfect candidate.
## Why Rust (besides the meme)?
I could fix it in Python. I could dial up the `asyncio`, tighten up error-handling with custom exceptions, and optimize memory. The sane option.
But sane doesn’t give me a *test bench* for adversarial development. Refactoring in Python lacks an external source of truth—the "before" and "after" would share language, libraries, and the LLM’s biases. A port to a different language? That’s different. If Rust’s output differs from Python’s for the same input, somebody’s lying. And that’s exactly the kind of verification I want to test.
Plus, Rust comes with properties that make the experiment more interesting:
- **The compiler as a first line of defense.** Nulls, type mismatches, data races—entire categories of bugs that might silently creep into Python won’t even compile in Rust. How many LLM hallucinations can the compiler block before they hit a test? I want to measure that.
- **True concurrency.** `tokio` allows one `spawn` per conversation. In Python, that’s a pain. This is the one functional improvement that really justifies the port.
- **Static binaries.** A 10MB executable instead of a `pip install` with 47 dependencies. That’s a win for distribution.
- **It’s cool.** Not technically a reason, but I don’t care.
## The adventure (and the invite)
RustyClaw—that’s the port’s name—is going to be a publicly documented experiment. Each module I port will be its own blog post. With real data: how many tokens used, cost, how often the AI hallucinated, and how long I fought with the *borrow checker*. No sugarcoating.
If I spend 3 hours on something I could have done in Python in 10 minutes, I’ll admit it. If the LLM invents a non-existent *crate* (spoiler: it will), I’ll detail it. If I realize at the end this port wasn’t worth it, I’ll confess to that too.
Everyone says, "I used AI to write code." No one publishes how much it cost, how often it lied to them, or if the code held up in production. That’s exactly what I’m going to do.
And I want you to come along for the ride. Because this is going to be an adventure—filled with compiler battles, "WHY WON’T THIS COMPILE IT’S OBVIOUS" moments, and small victories when a differential test passes green. It’s going to be fun. Or, at the very least, honest.
## The stack (cheat sheet)
If you’re a Pythonista, the left column will look familiar. If you’re a Rustacean, the right. If you’re neither, welcome to the chaos.
| Layer | Python (nanobot) | Rust (rustyclaw) |
|-------|------------------|------------------|
| Async runtime | `asyncio` | `tokio` |
| HTTP | `httpx` | `reqwest` |
| LLM routing | `litellm` | **Nonexistent** — custom router |
| Telegram | `python-telegram-bot` | `teloxide` |
| Discord | `websockets` (raw) | `tokio-tungstenite` (raw) |
| Config | `pydantic` | `serde` + `figment` |
| CLI | `typer` | `clap` |
| Errors | `str(e)` | `anyhow` + `thiserror` |
| Logging | `loguru` | `tracing` |
| AI copilot | — | Claude Code + Codex |
| Task runner | `make` | `just` |
| Issue tracker | — | `linear` CLI |
The row that hurts most is LiteLLM. In Python, it routes 100+ LLM providers in a single call. Nothing comes close in Rust. I’ll need to roll my own router. The upside? About 80% of LLM providers conform to OpenAI’s API, so between `async-openai` + a custom base URL, most use-cases are covered. Anthropic will need its own implementation.
Around ~300 lines of Rust. Sounds manageable. *Sounds.*
## Anti-hallucination strategy (the serious bit)
This is where the adversarial development theory meets reality. An LLM assisting in a port this size is a machine for plausibly inventing things.
The top risk isn’t that the code won’t compile—Rust doesn’t let garbage compile. The risk is that it compiles, passes tests, and silently does the wrong thing. Exactly the *silent failure* I wrote about two weeks ago.
Five layers of defense:
**1. Rust’s compiler.** Eliminates nulls, type mismatches, and data races. First free line of defense. But just because it compiles doesn’t make it right.
**2. Differential tests.** Same input → Python nanobot → output. Same input → RustyClaw → output. If they don’t match, something’s off. The Python code is the source of truth. This is the backbone of the experiment.
**3. Provenance tracking.** Each ported file gets a header with its original Python source, LLM session, and test differential results. Total traceability.
**4. Crate verification.** Every crate suggested by the LLM → manually verify on crates.io and docs.rs. LLMs will confidently propose non-existent crates and APIs that just don’t work.
**5. Incident logging.** Every detected hallucination → an issue logged with a `hallucination` label. Material for posts and lessons learned.
The golden rule:
> **The verification system must be external to the generator.**
If the LLM writes the code, the tests, and the fixtures, you’re validating fiction with fiction. Differential testing against the original Python code naturally breaks the cycle and makes the port inherently verifiable.
## *Does it matter?*
So, the uncomfortable question—does porting this to Rust even matter?
| Metric | Python | Rust (estimated) | Does it matter? |
|--------|--------|------------------|-----------------|
| Response latency | ~200ms overhead | ~5ms overhead | No. The LLM takes 2-5 seconds anyway. |
| RAM | ~50MB | ~5MB | No. My server has 8GB. |
| Concurrency | 1 message at a time | N messages in parallel | **Yes.** |
| Startup time | ~2s | ~50ms | Meh. |
| Binary | `pip install` + 47 deps | Single executable | **Yes.** |
| Type safety | `str(e)` everywhere | `Result<T, E>` | **Yes.** |
| The cool factor | None | High | Subjective. |
Three out of seven. Four, if we’re being generous. The latency and RAM improvements are meaningless since the bottleneck is always the LLM call. Concurrency matters for multiple users. A static binary is a real upgrade. And the type safety? After seeing how many bugs `str(e)` lets fly under the radar for months, yeah, that matters.
Does it justify weeks of work? As a standalone port, probably not. As a testbed for adversarial development with published real-world data? I think yes. By the end of this series, we’ll have hard numbers—not opinions.
## The raw numbers
Every work session will be logged in a public CSV in the repo:
```csv
date,llm,model,module,tokens_in,tokens_out,cost_usd,duration_min,loc_python,loc_rust,hallucinations,tests_pass
|