Systems Behind the Games We Love

April 27, 2026•System Design

system-designgamesarchitecturenetworking

I've been planning to write something related to system design for weeks now, past 4 months been reading alot of blogs about system design, concurrency patterns, distributed systems (recent addition). So here we are, but why about games?

Because it is alot more frustating when you have sub-150ms latency when it comes to games. It is extremely noticible aswell, you die before you even get to see the enemy in valorant, the ball somehow enters goalpost when you cleared it headon in rocket league. Games expose system design tradeoffs that backend systems hide. Payment API can queue a request for 500ms and nobody notices. But 500ms in a fps? Unplayable, uninstall right away.

So I started learning about how these games make sure that latency become unnoticible (if you have good internet :P)

I have categorized Games into 4 pool categories for now:

Pool 1 - Turn Based Games
Pool 2 - Fighting, Shooter & Physics Games
Pool 3 - MOBA (League of Legends, Dota 2)
Pool 4 - MMOs

Each pool optimizes for different constraints - player count, state size, acceptable latency. We'll dig into actual systems, tradeoffs and why it is architectured that way. I might not go fully in depth as the sources for this blog is hidden behind 1 hour talks, rarely talked about in articles aswell.

Let's start with the simplest one: turn-based games that can afford to wait for the server.

Pool 1 - Turn Based Games

The tradeoff here is: Perfect Consistency over instant feedback (Consistency >> Availability)

In turn based games like chess, heartstone, uno, most of the time is spent in thinking, planning / deciding and not on smashing 5 inputs per second. The only concern would be to make sure is no two users can input at the same time and the user provides an valid input to the system.

To do this, Atomic Transactions come into play, either every validation check and input succeeds, or it rollbacks the client who initiated the move and no move is displayed to both the clients.

Handling States in turn based games is easy as it it tiny (eg: Chess [8x8 grid with 32 pieces])

Lets look at LiChess (Open Source Alternative to Chess.com) Architecture

They use Websockets over Http Polling
State lives in Redis for active games (fast reads), MongoDB for completed games (cheap storage)

When a move happens:

Validate in-memory (actor holds game state)
Write to Redis (TTL = 24 hours or lesser (5 min games exist))
Async persist to MongoDB (durability, replays)

Tradeoffs taken by turn Based games:

Constraint	Details
Priorities	Correctness, Cheat prevention, simplicity
Sacrifices	Instant feedback, real-time feel
Latency	100 - 400ms is acceptable
Scalability	Horizontal by game_id. Each game is independent, shard by game, not by player.

Resources & Deep Dives:

Pool 2 - Fighting, Shooter & Physics Games

The tradeoff: Responsiveness over Consistency (Availability >> Consistency)

In shooters and fighting games, you're reacting in milliseconds. Press W, you expect to move now. Click to shoot, the gun fires instantly. Waiting 100ms for server confirmation? Unplayable. You'd be controlling a character through molasses.

So these games make a radical bet: lie to the player (metaphorically), show them what they want to see, fix it later if you were wrong.

I will plugin about the term Netcode here: It is an umbrella term for all networking stuff / algos for synchronizing player action, movement and game state between clients and servers in online multiplayer games.

Before 2008: Delay-Based Netcode was the norm - It intentionally delays player actions to match network latency (to make sure both clients see the same game state) this made player inputs feel slow and harder to play with. In 2006, Tony Cannon developed Rollback netcode and open sourced under the name GGPO(Good Game, Peace Out).

Heres what Rollback Netcode actually does:

When you press a button, your client immediately simulates the action locally (client-side prediction). It sends your input to the server. The server receives inputs from both players, advances the game state, and sends snapshots back to clients. If a client's prediction matches the server's authoritative state, nothing changes. If there's a mismatch (e.g., you got hit by an attack you didn't see), the client "rolls back" to the last correct state and replays any unacknowledged inputs. This way, players get instant feedback while maintaining a consistent game state across clients.

According to the Creator: the local client is going to be correct like 90% time as average input per second is 5.

Most fast-paced games develop their own version of it deriving from GGPO itself. Here's an example of Valorant talking about its own Netcode (Article Link)

From the Valorant Netcode article itself

They talk about the issue called as Peeker's Advantage and how they resolve it in the same article

TLDR:

Valorant minimizes peeker's advantage through three approaches:

128-tick servers instead of 64-tick for more frequent state updates.
Riot Direct - their own ISP infrastructure for optimized routing and lower latency globally.
Reduced Client-Side Buffering to minimize delays.

They calculated peeker's advantage was ~141ms under typical conditions and reduced it to ~60-100ms through these optimizations. The goal is zero peeker's advantage, but it's physically impossible - they aim to make reaction time differences negligible (under 80ms feels "fair" to pros).

Sequence of Events for such fps or fighting games

The core techniques modern shooters use are Client-Side Prediction, Server Reconciliation, and Lag Compensation.

Client-Side Prediction: You press W, your client immediately simulates movement locally. Zero perceived lag. Input travels to server (50-100ms). If server agrees, nothing changes. If server disagrees (you were slowed by an ability you didn't know about), your client snaps back to the server's position, that is called rubberbanding.
Server Reconciliation: Server is always authoritative. When your client gets a server snapshot at tick 1250, it compares: "I predicted position X, server says Y." If mismatch, client rewinds to server state and re-applies any unacknowledged inputs sent after tick 1250.
Lag Compensation: When you shoot, you're aiming at where the enemy was 50-100ms ago (network delay), not where they currently are. Server rewinds game state to your timestamp, performs hit detection in that rewound state. If you hit in your timeline, server validates it. This is why you die behind cover on the shooter's screen (their past), you were still exposed.

Tradeoffs:

Constraint	Details
Priorities	Instant feedback, competitive fairness (low-latency advantage), smooth movement
Sacrifices	Perfect consistency (you see different things than opponent temporarily), trust in client (requires anti-cheat), complex reconciliation logic
Latency tolerance	<50ms ideal, 50-100ms playable, >150ms frustrating
Scalability	32-64 players max per server instance (physics + netcode overhead)

Resources & Deep Dives:

Pool 3 - MOBA (League of Legends, Dota 2)

The tradeoff: Hybrid Precision over Pure Speed (Consistency ≈ Availability)

MOBAs sit in the middle. 1-2 clicks per second (avg) as everything except movement clicks have timeouts. Here client predicts movement, server owns combat.

instead of Rollback, they use Deterministic Lockstep, every client runs an identical, deterministic simulation. Inputs are the only thing being transmitted from the client.

Simplest Architecture i could show for explanation

Lets take at what League of Legends does:

30 ticks/second (33ms per tick). Lower than 128-tick rate of Valorant as skillshot timing matters.
Movement is Client-Predicted with server validation every tick
Abilities is server authoritative with client buffering
Collision/Hit Detection is also server-side
Solves the Entity Count Problem by using delta compression

What is Entity Count Problem?

A MOBA match has 10 champions, 100+ minions, jungle monsters, wards, pets, projectiles in flight. That's thousands of entities. Sending full state updates for everything every tick would kill bandwidth. League solves this with delta compression: only send what changed since the last snapshot. If Champion A moved and Minion 5 took damage, send only those two updates. If nothing changed on tick 1001, send an empty delta. This keeps bandwidth manageable even with complex state. Additionally, League uses interest management, only sending updates for entities your team has vision of. No vision, no updates, which prevents maphacks and reduces network load.

// How it would look like

Tick 1000: Champion A moved, Minion 5 took damage

→ Send: {championA: {pos: [x,y]}, minion5: {hp: 450}}

Tick 1001: Nothing changed

→ Send: {} (empty delta)

Tradeoffs:

Constraint	Details
Priorities	Combat correctness, cheat prevention (vision hacks, cooldown manipulation), ability interaction consistency
Sacrifices	Movement feels slightly delayed compared to shooters (30-50ms input buffer), higher ping players have noticeable disadvantage in skillshot dodging.
Latency tolerance	30-80ms ideal, 80-120ms playable, >150ms frustrating for carries
Scalability	10 players per game, ~100-200 entities total. Server can handle this at 30Hz without distributed state.

Resources & Deep Dives:

Determinism in League of Legends (4 Part Blog Chain)

Pool 4: MMOs (WoW, RuneScape, FFXIV)

The tradeoff: Scale over Synchronization (Availability >> Consistency)

MMOs are a completely different beast. Thousands of players in the same persistent world, simultaneously. A capital city might have 500+ players standing around, trading, chatting, dueling. A world boss raid? 40-100 players coordinating attacks.

You cannot synchronize that many players tightly. Physics won't allow it. So MMOs make a radical compromise: eventual consistency everywhere, tight sync nowhere.

Here's how they do it:

Architecture so ancient even AI struggles to find active sources

Sharding and Instancing: The first solution is simple: don't put everyone in the same physical space. Split the world.
- Sharding: Multiple copies of the same world running on different servers. Each realm is an isolated copy.
- Instancing: Dungeons and raids are separate server instances. When your 5-player party enters a dungeon, the server spins up a dedicated instance.
Phasing: Different players see different versions of the same zone based on quest progress. Your friend hasn't done the quest yet, so they see the village burning. Handling happens via different server processes.
Interest Management (Aggressive Filtering): Even within a shard, the server doesn't tell you about every player. If someone is 200 meters away, you don't get updates. The server maintains an "area of interest" and only broadcasts updates for entities within that radius.
Eventual Consistency in Combat: Unlike MOBAs where every damage tick is authoritative, MMOs accept delays and approximations. Your client shows the cast bar, plays the animation. The spell "fires" on your screen immediately. Server checks: "Do you have mana? Is it off cooldown? Is the target still in range?" Then applies damage. But here's the difference: the target might have moved. The server doesn't rewind time. It just checks current state.

This is why tab-target systems dominate MMOs. Skillshots require tight synchronization. Tab-target accepts "you clicked them 200ms ago, they're still valid now" approximations.

The Database Bottleneck

Every action in an MMO writes to a database. Loot drops, quest completion, inventory changes, gold transactions. These aren't ephemeral like a League match. They persist forever. This creates a fundamental scalability limit. Combat simulation can scale horizontally, but the database is a single source of truth.

Solution they probably use: aggressive caching, async writes, and accepting that some data loss is tolerable (your /emote command doesn't need ACID guarantees).

Tradeoffs Taken by MMOs:

Constraint	Details
Priorities	Massive scale (1000+ concurrent players per shard), persistent world state, social features
Sacrifices	Combat feels sluggish (200ms+ ability delays common), no twitch mechanics, eventual consistency
Latency tolerance	100-200ms playable, 200-400ms annoying but functional, >500ms barely playable
Scalability	Horizontal via sharding/instancing, but limited by database writes and interest management overhead

Resources & Deep Dives:

This Blog took around 4 days of procrastination, 2 days of reading, 5 hours of writing and 8 hours of diagraming.