Seb Zuddas Wiki

❯

❯

❯

❯

Sharding

Jan 22, 20262 min read

Why Shard a Database?

When you outgrow a database.
- You need more writes/second.
- Storage is approaching a limit
- Database in general is straining.
Sharding is useful to scale horizontally, meaning more databases rather than bigger databases.

What is Sharding

Splitting the data across multiple databases.
The shards together form the full dataset.
As you grow, you add more shards, not larger databases.

How to Shard Data

This raises questions:

What to shard by (shard key - how to group data)
How to distribute data ()

Choosing Shard Keys

Good shard keys

High cardinality (field with lots of unique values)
even distribution (values should naturally spread, so each shard has an even amount of data)
aligns with queries (1 query hits one shard)

Good examples

Split by user ID (shard 1 has users 0, 10m; shard 2 has users 10m, 20m; etc)
Split by order ID (shard 1 has orders 0, 10m; shard 2 has orders 10m, 20m; etc)
Think about what users are requesting frequently.

What to Distribute By

Range-based (user IDs, order IDs)
Hash-based distribution.
1. Take a uniqueID, hash it (to randomise) and run a modulus operation on how many databases you have. The problem is, if you add a new shard.
Consistent-hashing.
1. The coin spinning analogy.
2. Eliminates the giant reshuffling problem.
Directory-based Sharding

Graph View

Why Shard a Database?
What is Sharding
How to Shard Data
Choosing Shard Keys
Good shard keys
Good examples
What to Distribute By

Created with Quartz v4.5.0 © 2026

Website
GitHub
Linkedin