Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Spotify | TuneIn | RSS
We dive back into Designing Data-Intensive Applications to learn more about replication while Michael thinks cluster is a three syllable word, Allen doesn’t understand how we roll, and Joe isn’t even paying attention.
For those that like to read these show notes via their podcast player, we like to include a handy link to get to the full version of these notes so that you can participate in the conversation at https://www.codingblocks.net/episode160.
Sponsors
- Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
- Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines.
- Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription.
Survey Says
News
- Thank you to everyone that left us a new review:
- Audible: Ashfisch, Anonymous User (aka András)
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair Douglas Adams
Douglas Adams
Replication in Distributed Systems
- When we talk about replication, we are talking about keeping copies of the same data on multiple machines connected by a network
- For this episode, we’re talking about data small enough that it can fit on a single machine
- Why would you want to replicate data?
- Keeping data close to where it’s used
- Increase availability
- Increase throughput by allowing more access to the data
- Data that doesn’t change is easy, you just copy it
- 3 popular algorithms
- Single Leader
- Multi-Leader
- Leaderless
- Well established (1970’s!) algorithms for dealing with syncing data, but a lot data applications haven’t needed replication so the practical applications are still evolving
- Cluster group of computers that make up our data system
- Node each computer in the cluster (whether it has data or not)
- Replica each node that has a copy of the database
- Every write to the database needs to be copied to every replica
- The most common approach is “leader based replication”, two of the algorithms we mentioned apply
- One of the nodes is designated as the “leader”, all writes must go to the leader
- The leader writes the data locally, then sends to data to it’s followers via a “replication log” or “change stream”
- The followers tail this log and apply the changes in the same order as the leader
- Reads can be made from any of the replicas
- This is a common feature of many databases, Postgres, Mongo, it’s common for queues and some file systems as well
Synchronous vs Asynchronous Writes
- How does a distributed system determine that a write is complete?
- The system could hang on till all replicas are updated, favoring consistency…this is slow, potentially a big problem if one of the replicas is unavailable
- The system could confirm receipt to the writer immediately, trusting that replicas will eventually keep up… this favors availability, but your chances for incorrectness increase
- You could do a hybrid, wait for x replicas to confirm and call it a quorum
- All of this is related to the CAP theorem…you get at most two: Consistency, Availability and Partition Tolerance
- Site Note: Can you ever have Consistent/Available databases? https://codahale.com/you-cant-sacrifice-partition-tolerance/
- The book mentions “chain replication” and other variants, but those are still rare
- Example: Chain replication in Mongo: https://docs.mongodb.com/manual/tutorial/manage-chained-replication/
Steps for Adding New Followers
- Take a consistent snapshot of the leader at some point in time (most db can do this without any sort of lock)
- Copy the snapshot to the new follower
- The follower connects to the leader and requests all changes since the back-up
- When the follower is fully caught up, the process is complete
Handling Outages
- Nodes can go down at any given time
- What happens if a non-leader goes down?
- What does your db care about? (Available or Consistency)
- Often Configurable
- When the replica becomes available again, it can use the same “catch-up” mechanism we described before when we add a new follower
- What happens if you lose the leader?
- Failover: One of the replicas needs to be promoted, clients need to reconfigure for this new leader
- Failover can be manual or automatic
Rough Steps for Failover
- Determining that the leader has failed (trickier than it sounds! how can a replica know if the leader is down, or if it’s a network partition?)
- Choosing a new leader (election algorithms determine the best candidate, which is tricky with multiple nodes, separate systems like Apache Zookeeper)
- Reconfigure: clients need to be updated (you’ll sometimes see things like “bootstrap” services or zookeeper that are responsible for pointing to the “real” leader…think about what this means for client libraries…fire and forget? try/catch?
Failover is Hard!
- How long do you wait to declare a leader dead?
- What if the leader comes back? What if it still thinks it’s leader? Has data the others didn’t know about? Discard those writes?
- Split brain – two replicas think they are leaders…imagine this with auto-incrementing keys… Which one do you shut down? What if both shut down?
- There are solutions to these problems…but they are complex and are a large source of problems
- Node failures, unreliable networks, tradeoffs around consistency, durability, availability, latency are fundamental problems with distributed systems
Implementation of Replication Logs
- 3 main strategies for replication, all based around followers replaying the same changes
Statement-Based Replication
- Leader logs every Insert, Update, Delete command, and followers execute them
- Problems
- Statements like NOW() or RAND() can be different
- Auto-increments, triggers depend on existing things happen in the exact order..but db are multi-threaded, what about multi-step transactions?
- What about LSM databases that do things with delete/compaction phases?
- You can work around these, but it’s messy – this approach is no longer popular
- Example, MySQL used to do it
Write Ahead Log Shipping
- LSM and B-Tree databases keep an append only WAL containing all writes
- Similar to statement-based, but more low level…contains details on which bytes change to which disk blocks
- Tightly coupled to the storage engine, this can mean upgrades require downtime
- Examples: Postgres, Oracle
Row Based Log Replication
- Decouples replication from the storage engine
- Similar to WAL, but a litle higher level – updates contain what changed, deletes similar to a “tombstone”
- Also known as Change Data Capture
- Often seen as an optional configuration (Sql Server, for example)
- Examples: (New MySQL/binlog)
Trigger-Based Replication
- Application based replication, for example an app can ask for a backup on demand
- Doesn’t keep replicas in sync, but can be useful
Resources We Like
- Other Episodes on “Designing Data Intensive Applications
- Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
- You Can’t Sacrifice Partition Tolerance (codahale.com)
- Manage Chained Replication (docs.mongodb.com)
- Doug DeMuro’s YouTube channel (YouTube)
- Apache ZooKeeper (Wikipedia, Apache)
Tip of the Week
- A collection of CSS generators for grid, gradients, shadows, color palettes etc. from Smashing Magazine.
- Learn This One Weird ? Trick To Debug CSS (freecodecamp.org)
- Previously mentioned in episode 81.
- Use
tree
to see a visualization of a directory structure from the command line. Install it in Ubuntu viaapt install tree
. (manpages.ubuntu.com) - Initialize a variable in Kotlin with a try-catch expression, like
val myvar: String = try { ... } catch { ... }
. (Stack Overflow) - Manage secrets and protect sensitive data (and more with Hashicorp Vault. (Hashicorp)