| |
Each individual query is trivial. The query planner executes them in microseconds. And if one fails or slows down, you know exactly which one.
The invisible defenses
What I found most brilliant about Bohan Zhang’s article wasn’t the big numbers, but the small defensive measures that prevent everything from spiraling out of control:
idle_in_transaction_session_timeout
If a transaction stays open without doing anything, PostgreSQL kills it after a configurable time. Why does this matter? Because an open transaction blocks the autovacuum. Without autovacuum, tables bloat, indexes degrade, and eventually your database slows down more and more each day.
It’s like leaving the fridge door open. Nothing happens in the first 5 minutes, but if you forget it overnight, everything inside goes bad.
Schema changes with a 5-second timeout
When you run an ALTER TABLE in PostgreSQL, it requires a lock on the table. If long-running transactions are active, that lock waits. While it’s waiting, it blocks all new queries. A schema migration that normally takes 200ms could take down your database if there’s an old transaction lingering.
OpenAI’s solution: SET lock_timeout = '5s'. If the migration can’t acquire the lock in 5 seconds, it aborts. It’s better to fail fast and retry than to block the entire system while waiting.
Rate limiting in 4 layers
Not one, not two—four layers of rate limiting:
- Edge/CDN — Blocks abusive traffic before it even hits the application.
- API gateway — Per-user/API key rate limits.
- Application — Limits based on the type of operation.
- Database — Connection limits and statement timeouts.
Each layer catches what the previous one misses. Defense in depth. The same “onion philosophy” I use for defenses against hallucinations, but applied to infrastructure.
Workload isolation by priority
Not all queries are equal. A query to “show the user’s chat” is critical—if it fails, the user sees an error. A query to “generate an analytics report” is important but can wait 30 seconds.
OpenAI routes queries by priority to different read replicas. High-priority replicas handle less load and respond faster. Low-priority replicas can afford heavier loads without impacting the user experience.
It’s common sense, but it requires discipline. You have to classify every query, configure routing, and resist the temptation to send everything to the fast replica “because it’s just one more query.”
Backfills that take weeks
When you need to populate a new column for 800 million users, you can’t just run UPDATE users SET new_column = computed_value. That would lock the table, saturate storage, and probably crash the primary.
At OpenAI, backfills run under strict rate limiting. Weeks. A backfill that takes weeks.
Does that sound awful? It’s the opposite. It’s the decision of a team that understands that the speed of the backfill is irrelevant compared to system stability. Better to take 3 weeks without anyone noticing than 3 hours and trigger a SEV-0 at 2 a.m.
Cascading replication in the pipeline
Currently, they have ~50 replicas connected directly to the primary. Each replica consumes a replication connection and bandwidth from the primary. With 50, it’s manageable. With 100+, it would be a problem.
Their planned solution: cascading replication. Replicas replicating from other replicas instead of directly from the primary. A tree instead of a star. The primary sends data to 5-10 first-level replicas, which then feed the rest.
It’s the same concept as BitTorrent. Instead of everyone downloading from the same server, peers share pieces among themselves. Works for pirated movies, works for WAL segments.
The lesson no one wants to hear
The industry has an addiction to over-engineering. Every week, a new database is launched promising to solve problems most companies don’t even have. And every week, engineering teams adopt those technologies because “it scales better” or “it’s more modern,” without asking themselves if PostgreSQL with some discipline would do the job.
OpenAI—the company defining the future of AI, with one of the fastest-growing products in history—uses PostgreSQL. With a single primary. No sharding. No exotic distributed database.
They use PgBouncer (2007). Read replicas (a concept from the ’90s). Connection pooling (as old as relational databases). Rate limiting (invented before most of us were born).
The magic isn’t in the technology. It’s in the discipline:
- Simple queries instead of monstrous joins
- Aggressive timeouts instead of infinite waits
- Workload isolation instead of “everything on one server”
- Migrating only what needs to be migrated, not rewriting the whole system
At your next standup
Next time someone on your team suggests migrating to a distributed database, or sharding PostgreSQL, or inserting a message queue between the API and the database “because it won’t scale,” show them these numbers.
800 million users. One primary. p99 of 10-19ms. 99.999% uptime.
And ask them: “Is PostgreSQL really our scalability bottleneck? Or are our queries hacky?”
Because almost always, it’s the latter.
Source: Inside the Postgres Setup Powering 800M ChatGPT Users — Bohan Zhang, OpenAI. If you read just one infrastructure article this year, make it this one.