1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
title: "OpenAI scales PostgreSQL to 800 million users with a single writer (and no sharding)"
date: 2026-03-11T20:00:00+01:00
draft: false
slug: "openai-postgresql-800m-users-single-writer"
slug_en: "openai-postgresql-800m-users-single-writer"
description: "OpenAI serves 800M ChatGPT users with a single PostgreSQL primary and ~50 replicas. No sharding, no microservices. Deliberate simplicity beats over-engineering."
tags: ["postgresql", "scalability", "infrastructure", "openai", "databases"]
categories: ["opinion"]

translation:
  hash: ""
  last_translated: ""
  notes: |
    - "ñapa": means "hack/kludge/quick fix". Not strongly derogatory, implies something done the easy way instead of properly.
    - "dicho en cristiano": "in plain language". No religious connotation.
    - "barra del bar": "bar counter" — metaphor for casual conversation setting.
    - "chapuza": "bodge/kludge". Quick-and-dirty approach.
    - "baqueteado": "battle-hardened" / "well-worn from experience". Positive connotation despite sounding rough.
    - "buen rollito": "good vibes" / "feel-good factor". Casual, positive.
---

Every time an article drops about the infrastructure of a large company, half the comments on Hacker News are variations of "of course, they’re using Kubernetes with 47 microservices and a custom consensus protocol distributed database." And when it turns out they’re not—that they’re using barebones PostgreSQL with a single *primary* and some discipline—there’s an awkward silence.

That just happened with OpenAI.

## The numbers no one expected

Bohan Zhang, an infrastructure engineer at OpenAI, recently shared the details of how they scale PostgreSQL for ChatGPT. The numbers:

- **800 million users**
- **A single PostgreSQL *primary*** (writer) on Azure
- **~50 *read replicas***
- **Millions of queries per second**
- **p99 latency of 10-19ms**
- **99.999% uptime**
- **One SEV-0 in a year** (and that was triggered by the viral launch of ImageGen, which brought in 100 million new users in a single week)

Read that again. One. Single. Writer. For 800 million users.

## "But they should shard"

No. And the reason is brutally pragmatic.

Sharding PostgreSQL would have required modifying **hundreds of endpoints** in the application. Every query that assumes all data lives in the same database—which is practically all of them—would have to be rewritten to figure out which shard the data resides in.

The cost of that migration? Months of engineering work, brand-new bugs popping up everywhere, and a transition period where you have to maintain both systems.

What did they do instead? They identified the most write-heavy operations and moved them to Cosmos DB. Not because Cosmos is better than PostgreSQL, but because those specific workloads were better suited to a document-based model. Everything else—the vast majority of business logic—stayed in PostgreSQL.

To put it plainly: instead of overcomplicating the entire system, they isolated the problem and solved it where it hurt most. Surgery with a scalpel, not a chainsaw.

## PgBouncer: from 50ms to 5ms per connection

One of the first bottlenecks they encountered was the connection latency. PostgreSQL creates a new process for every connection. With thousands of simultaneous connections coming in from hundreds of application pods, the connection overhead was eating up 50ms before even executing a single query.

The solution: PgBouncer as a *connection pooler*. It maintains a pool of already-established connections and reuses them. The result? Connection latency dropped to 5ms. A 90% reduction, just by swapping out one plumbing component.

This isn’t cutting-edge tech. PgBouncer has been in production at companies of all sizes for over 15 years. And there it is: a battle-hardened, boring tool solving a problem in one of the most-used applications on the planet.

## The ORM doing 12-table *joins*

This one’s my favorite. Because I’ve seen it in student projects, in startups, in banks—everywhere.

The ORM was generating queries with 12-table *joins*. Not because someone designed it that way, but because the models were interrelated, and the ORM obediently followed the relationships all the way.

The solution wasn't ditching the ORM or switching to manual queries for everything. It was **moving logic into the application**. Instead of asking PostgreSQL to handle a monstrous *join*, they made several simpler queries and stitched the data together in code.

Is it less elegant? Sure. Is it faster? Immensely. Because PostgreSQL can optimize simple queries far better than a 12-table *join* with cross-cutting conditions. And because you can cache partial results and reuse them.

```sql
-- BEFORE: ORM generates this
SELECT u.*, p.*, s.*, t.*, ...
FROM users u
JOIN profiles p ON ...
JOIN settings s ON ...
JOIN teams t ON ...
JOIN ... -- 12 tables
WHERE u.id = $1;

-- AFTER: separate queries, logic in application
SELECT * FROM users WHERE id = $1;
SELECT * FROM profiles WHERE user_id = $1;
-- cacheable, parallelizable, debuggable

Each individual query is trivial. The query planner executes them in microseconds. And if one fails or slows down, you know exactly which one.

The invisible defenses

What I found most brilliant about Bohan Zhang’s article wasn’t the big numbers, but the small defensive measures that prevent everything from spiraling out of control:

idle_in_transaction_session_timeout

If a transaction stays open without doing anything, PostgreSQL kills it after a configurable time. Why does this matter? Because an open transaction blocks the autovacuum. Without autovacuum, tables bloat, indexes degrade, and eventually your database slows down more and more each day.

It’s like leaving the fridge door open. Nothing happens in the first 5 minutes, but if you forget it overnight, everything inside goes bad.

Schema changes with a 5-second timeout

When you run an ALTER TABLE in PostgreSQL, it requires a lock on the table. If long-running transactions are active, that lock waits. While it’s waiting, it blocks all new queries. A schema migration that normally takes 200ms could take down your database if there’s an old transaction lingering.

OpenAI’s solution: SET lock_timeout = '5s'. If the migration can’t acquire the lock in 5 seconds, it aborts. It’s better to fail fast and retry than to block the entire system while waiting.

Rate limiting in 4 layers

Not one, not two—four layers of rate limiting:

  1. Edge/CDN — Blocks abusive traffic before it even hits the application.
  2. API gateway — Per-user/API key rate limits.
  3. Application — Limits based on the type of operation.
  4. Database — Connection limits and statement timeouts.

Each layer catches what the previous one misses. Defense in depth. The same “onion philosophy” I use for defenses against hallucinations, but applied to infrastructure.

Workload isolation by priority

Not all queries are equal. A query to “show the user’s chat” is critical—if it fails, the user sees an error. A query to “generate an analytics report” is important but can wait 30 seconds.

OpenAI routes queries by priority to different read replicas. High-priority replicas handle less load and respond faster. Low-priority replicas can afford heavier loads without impacting the user experience.

It’s common sense, but it requires discipline. You have to classify every query, configure routing, and resist the temptation to send everything to the fast replica “because it’s just one more query.”

Backfills that take weeks

When you need to populate a new column for 800 million users, you can’t just run UPDATE users SET new_column = computed_value. That would lock the table, saturate storage, and probably crash the primary.

At OpenAI, backfills run under strict rate limiting. Weeks. A backfill that takes weeks.

Does that sound awful? It’s the opposite. It’s the decision of a team that understands that the speed of the backfill is irrelevant compared to system stability. Better to take 3 weeks without anyone noticing than 3 hours and trigger a SEV-0 at 2 a.m.

Cascading replication in the pipeline

Currently, they have ~50 replicas connected directly to the primary. Each replica consumes a replication connection and bandwidth from the primary. With 50, it’s manageable. With 100+, it would be a problem.

Their planned solution: cascading replication. Replicas replicating from other replicas instead of directly from the primary. A tree instead of a star. The primary sends data to 5-10 first-level replicas, which then feed the rest.

It’s the same concept as BitTorrent. Instead of everyone downloading from the same server, peers share pieces among themselves. Works for pirated movies, works for WAL segments.

The lesson no one wants to hear

The industry has an addiction to over-engineering. Every week, a new database is launched promising to solve problems most companies don’t even have. And every week, engineering teams adopt those technologies because “it scales better” or “it’s more modern,” without asking themselves if PostgreSQL with some discipline would do the job.

OpenAI—the company defining the future of AI, with one of the fastest-growing products in history—uses PostgreSQL. With a single primary. No sharding. No exotic distributed database.

They use PgBouncer (2007). Read replicas (a concept from the ’90s). Connection pooling (as old as relational databases). Rate limiting (invented before most of us were born).

The magic isn’t in the technology. It’s in the discipline:

  • Simple queries instead of monstrous joins
  • Aggressive timeouts instead of infinite waits
  • Workload isolation instead of “everything on one server”
  • Migrating only what needs to be migrated, not rewriting the whole system

At your next standup

Next time someone on your team suggests migrating to a distributed database, or sharding PostgreSQL, or inserting a message queue between the API and the database “because it won’t scale,” show them these numbers.

800 million users. One primary. p99 of 10-19ms. 99.999% uptime.

And ask them: “Is PostgreSQL really our scalability bottleneck? Or are our queries hacky?”

Because almost always, it’s the latter.


Source: Inside the Postgres Setup Powering 800M ChatGPT Users — Bohan Zhang, OpenAI. If you read just one infrastructure article this year, make it this one.