A Kalman Filter to Avoid Annoying the Server (or the Guilty Pleasure of Over-Engineering)

I have a menu bar app that needs to know a single number. A percentage from 0 to 100. To fetch it, it pings a server every 30 seconds.

Do the math: 30 seconds equals 2 calls per minute, 120 per hour, 960 over an 8-hour workday. Nearly a thousand HTTP requests a day to check a number that sometimes doesn’t change for 20 minutes.

That’s not monitoring. That’s harassment.

The real problem isn’t technical. It’s political.

When you depend on an API you don’t control — that isn’t public, doesn’t have documented rate limits, and belongs to a company that can change its Terms of Service on a random Tuesday — every unnecessary request is a risk. Not of a timeout. Of being cut off.

The endpoint I use isn’t documented. It works today. It’s worked for months. But every request I send is just another line in some log that someone at Anthropic may notice and decide a third-party app is making too much noise.

So the question isn’t “How do I make polling faster?” but “How do I poll as little as possible without losing key information?”

And that, my friend, is where a reasonable engineer would write a simple if statement — and an engineer with a guilty pleasure for over-engineering builds a Kalman filter.

The naive solution (and why it fails)

The first instinct is straightforward: if the number hasn’t changed, don’t ask.

1
2
3
4
if value == previous_value:
    wait longer
else:
    reset to 30 seconds

That fails miserably. The value changes when you do something (send messages, use tokens). But it also changes when you don’t do anything — the quota has a sliding 5-hour window, so old tokens expire on their own. And if you’re using the service from another device, the value increases without you knowing it.

You can’t just check if it’s changed. You need to predict when it will change and with what level of confidence.

Kalman for people in a hurry

A Kalman filter is a machine for blending two imperfect sources of information.

Imagine you’re in a windowless room and want to know the temperature outside. You have two options:

Your mental model: “It’s 3 PM in March in Madrid, so I’d estimate it’s about 64°F.” That’s a reasonable guess, but not perfect — it might have rained, or there could be wind.
A cheap thermometer: You can step out to the balcony and check, but your thermometer is budget-friendly and bounces around by about ±5°F.

Neither source is perfect. The Kalman filter instructs: combine them, but give more weight to the one that’s more reliable at any given moment.

If you just checked the thermometer 10 seconds ago, your mental model is good — rely on it and skip another check. But if it’s been an hour since your last reading, your mental model has degraded — you should step outside.

The key concept is variance, a number that represents “how much I trust my current estimate.” Right after checking the thermometer, variance is low. Over time, it grows. When it reaches a certain threshold, the filter says, “I don’t trust myself anymore; I need real data.”

In my case:

The mental model = the local token usage cost. I know how much I’ve used on Claude Code, so I can calculate how much the quota likely increased.
The thermometer = Anthropic’s API. A real data point, but each request has a political and energy cost.
The variance = uncertainty that grows over time. If I’ve used the service from another browser or device, my local model has no idea about that — and that degrades its prediction.

A full-fledged Kalman filter (multidimensional, covariance matrices) would be overkill for this. My solution is scalar: one state (utilization), one sensor (the API), a linear model (cost per budget). 20 lines of code. The minimum viable implementation to solve the problem.

Translated to my specific problem:

Prediction: estimated_utilization = last_real_value + (new_local_cost / budget) × 100
Correction: Reset the variance to zero every time the server responds.
Uncertainty: Variance grows linearly over time. σ = √(Q × seconds_since_last_correction).

The trick: the filter decides when to query

Here’s where over-engineering becomes justified. The filter doesn’t just estimate the value — it decides when it needs real data. Five rules, evaluated on every tick:

Rule	Trigger	Why
Window reset	`now ≥ resetsAt`	Tokens expired. The previous data is invalid.
High uncertainty	`σ > 5%`	I trust my prediction less.
Threshold crossing	Confidence interval crosses 80%, 95%, or 100%	My estimate is near a critical value. The user needs to know.
Proximity	`utilization` within 8% of a threshold	I might have crossed a line without noticing (external activity).
Safety timeout	15 minutes without real data	Just in case. Paranoia is a virtue in monitoring software.

If none of the rules trigger, the filter says, “Relax, I’ve got this,” and the app doesn’t make the HTTP request. The value displayed to the user is the local estimate.

Pay attention here: the local estimate costs zero network, zero battery, zero risk. It’s pure math in memory.

The numbers: before and after

On a typical day with steady quota usage (moderate use, no spikes):

Scenario	Requests/hour	Requests/day (8h)
Fixed 30s polling	120	960
With Bayesian estimator	15-30	120-240
Estimator + dormant	4-10	30-80

That’s a 75-97% reduction in network calls. Not bad for “just” making local estimates between real requests.

But wait, there’s more (adaptive degradation)

The Kalman filter solves the “when to query” problem. But there’s another layer: how much effort to put into querying.

The app has a polling policy that adjusts the base interval based on context:

1
2
3
4
5
6
Recent activity (<10 min) → 30s
Moderate idle (10 min - 1h) → 120s
Extended idle (>1h) → 300s
Quota > 80% → Always 30s (critical zone)
Low power mode → Double base interval
Consecutive errors → Exponential backoff (up to 5 min)

Each level is a decision about “how much information I need right now.” If you’re not coding, why burn battery checking your quota every 30 seconds? If your laptop’s at 15% battery, is it worth doubling HTTP requests?

Dormant mode: when the app puts itself to sleep

And here comes my favorite part. The one that probably wasn’t necessary, but left me grinning with the satisfaction of creating something unnecessarily elegant.

When the Bayesian estimator produces five consecutive estimates where the value changes by less than 0.5%, the app enters dormant mode:

Stops the timer.
Halts estimation.
Listens to the filesystem.

Why the filesystem? Because if you’re using the service, local files get generated. When the file watcher detects activity, the app wakes up, makes an immediate API call to ground itself in reality, and resumes its normal cycle.

It’s like a sleeping dog by the door. It doesn’t use energy, but the moment it hears the key, it’s instantly awake.

The result: if you stop working at 2 PM and return at 4 PM, the app makes zero requests during those two hours. Zero. Not even a 5-minute polling cycle, no keepalive, no heartbeat. The timer literally doesn’t exist. And when you come back, the data is refreshed in milliseconds.

“Wouldn’t a simple `setInterval` every 5 minutes have worked just as well?”

Yes. Much easier. And probably fine for 90% of users.

But there’s a key difference when your app runs 8 hours a day in the background:

	`setInterval(5min)`	Estimator + dormant
Idle requests/day	96	0
Active requests/day	96	30-80 (adaptive)
Update latency	0-5 min	<1s (FSEvent wake)
Battery drain when idle	Constant	None
Critical zone accuracy	Same (5 min delay)	30s (zone > 80%)

The key row is the third. With a fixed 5-minute timer, if your quota jumps from 78% to 95% between ticks, you won’t know for up to 5 minutes. With the Bayesian estimator, the interval drops to 10 seconds when it predicts a change, and the filter pings the server the moment the confidence interval crosses 80%.

Put in plain language: it reacts faster while making fewer requests.

The serious bit: why this is responsible software

Let me take off my over-engineer hat and put on my regular engineer hat.

Every HTTP request your app makes in the background has a cost — a cost you pay, a cost the server pays, and a cost the planet pays. This isn’t hyperbole. It’s thermodynamics. A network wakeup on a sleeping laptop turns on the WiFi radio, negotiates TLS, waits for a response, processes the data, and then goes back to sleep. Multiply that by a thousand apps doing the same thing, and it’s part of the reason your MacBook battery lasts 6 hours instead of 10.

Apple knows this. That’s why macOS has App Nap, Timer Coalescing, and penalizes apps with high Energy Impact. My starting point was an app with an 857 Energy Impact. My goal was to bring that down to below 5.

The Bayesian estimator plus dormant mode wasn’t just a whim. It was the only way to achieve that number without sacrificing the user experience. Making fewer requests was mandatory. Doing it intelligently was the challenge.

The recipe, in case it helps

If you have an app that polls a server and you want to reduce requests without sacrificing responsiveness:

Measure if you can predict locally. If the value you’re fetching depends on inputs you also have locally, you can interpolate between server requests.
Model uncertainty. It’s not enough to predict. You need to know how much you trust the prediction. A scalar Kalman filter is 20 lines of code.
Define decision thresholds. What value ranges require accuracy? Don’t waste precision where it doesn’t matter (e.g., 0-60%), and focus polling where it does (80-100%).
Adapt to context. Low battery, extended idle time, server errors — each context has a different cost for making a request. Your polling should reflect this.
Have a zero mode. If there’s no activity, do nothing. Literally, nothing. Not a longer timer — nothing. A filesystem or network event will wake you up when needed.

The guilty pleasure

I’ll be honest: Did I need a Kalman filter for a menu bar app that displays a percentage? Probably not. A couple of if statements with a few heuristics would have solved 80% of the problem.

But the other 20% is the difference between an app that “sort of works” and one a user can leave running for 12 hours without even noticing it’s there. Between an 857 Energy Impact and less than 5. Between 960 requests a day and 30.

Sometimes, the guilty pleasure of over-engineering is exactly what the problem needed. You just didn’t know it until you built it.

And if someone at Anthropic ever looks at their server logs and sees my app making 30 requests a day instead of a thousand, I hope they think, “This guy really put in the work.” And they don’t cut me off.

The real problem isn’t technical. It’s political.#

The naive solution (and why it fails)#

Kalman for people in a hurry#

The trick: the filter decides when to query#

The numbers: before and after#

But wait, there’s more (adaptive degradation)#

Dormant mode: when the app puts itself to sleep#

“Wouldn’t a simple setInterval every 5 minutes have worked just as well?”#

The serious bit: why this is responsible software#

The recipe, in case it helps#

The guilty pleasure#