Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Spotify | TuneIn | RSS
In this episode, Allen is back, Joe knows his maff, and Michael brings the jokes, all that and more as we discuss the internals of how databases store and retrieve the data we save as we continue our deep dive into Designing Data-Intensive Applications.
If you’re reading these show notes via your podcast player, did you know that you can find them at https://www.codingblocks.net/episode127? Well you do now! Check it out and join in the conversation.
Sponsors
- Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
- Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 10% off any course or annual subscription.
- Clubhouse – The fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Sign up to get two additional free months of Clubhouse on any paid plan by visiting clubhouse.io/codingblocks.
Survey Says
News
- We thank all of the awesome people that left us reviews:
- iTunes: TheLunceforce, BrianMorrisonMe, Collectorofmuchstuff, Momentum Mori, brianbrifri, Isyldar, James Speaker
- Stitcher: adigolee
- Come see Allen, Joe, and Michael in person at the 15th Annual Orlando Code Camp & Tech Conference, March 28th. Sign up for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com)
Database Storage and Retrieval
- A database is a collection of data.
- A database management system includes the database, APIs for managing the data and access to it.
RDBMS Storage Data Structures
- Generally speaking, data is written to a log in an append only fashion, which is very efficient.
- Log: an append-only sequence of records; this doesn’t have to be human readable.
- These write operations are typically pretty fast because writing to the end of a file is generally a very fast operation.
- Reading for a key from a file is much more expensive though as the entire file has to be scanned for instances of the key.
- To solve this problem, there are indexes.
- Generally speaking, an index is just different ways to store another structure derived from the primary set of data.
- Having indices incurs additional overhead on writes. You’re no longer just writing to the primary data file, but you’re also keeping the indices up to date at the same time.
- This is a trade-off you incur in databases: indexes speed up reads but slow down writes.
Hash Indexes
- One possible solution is to keep every key’s offset (which points to the location of the value of the key) in memory.
- This is what is done for Bitcask, the default storage engine for Riak.
- The system must have enough RAM for the index though.
- In the example given, all the keys stay in memory, but the file is still always appended to, meaning that the key’s offset is likely to change frequently, but it’s still very efficient as you’re only ever storing a pointer to the location of the value.
- If you’re always writing to a file, aren’t you going to run out of disk space?
- File segmenting / compaction solves this.
- Duplicate keys in a given file are compacted to store just the last value written for the key, and those values are written to a new file.
- This typically happens on a background thread.
- Once the new segment file has been created, after merging in changes from the previous file, then it becomes the new “live” log file.
- This means while the background thread is running to create the new segment, the locations for keys are being read from the old segment files in the meantime so that processes aren’t blocked.
- After the new segment file creation is completed, the old segment files can be deleted.
- This is how Kafka topic retention policies work, and what happens when you run “force merge” on an Elasticsearch index (same goes for similar systems).
- Duplicate keys in a given file are compacted to store just the last value written for the key, and those values are written to a new file.
- File segmenting / compaction solves this.
- Some key factors in making this work well:
- File format
- CSV is not a great format for logs. Typically you want to use a binary format that encodes the length of the string in bytes with the actual string appended afterwards.
- Deleting records requires some special attention
- You have to add a tombstone record to the file. During the merge process, the key and values will be deleted.
- Crash recovery
- If things go south on the server, recovering might take some time if there are large segments or key/value pairs.
- Bitcask makes this faster by snapshotting the in-memory hashes on occasion so that starting back up can be faster.
- Incomplete record writes
- Bitcask files include checksums so any corruption in the logs can be ignored.
- Concurrency control
- It’s common for there to only be one writer thread, but multiple reader threads, since written data is immutable.
- File format
Why not update the file, instead of only appending to it?
- Appending and merging are sequential operations, which are particularly efficient on HDD and somewhat on SSD.
- Concurrency and crash recovery are much simpler.
- Merging old segments is a convenient and unintrusive way to avoid fragmentation.
Downsides to Hash Indexes
- The hash table must fit in memory or else you have to spill over to disk which is inefficient for hash table.
- Range queries are not efficient, you have to lookup each key.
Resources We Like
- Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
- Grokking the System Design Interview (Educative.io)
Tip of the Week
- Add authentication to your applications with minimum fuss using KeyCloak. (keycloak.org)
- Master any major city’s transit system like a boss with CityMapper. (citymapper.com)
- Spin up a new VM with a single command using Multipass. (GitHub)
- We referenced Stefan Scherer’s Docker images again. (episode 80, Docker, GitHub)
- Random User Generator – like Lorem Ipsum, but for people. (randomuser.me)
- The perfect gifts for that nerd in your life. (remembertheapi.com)
- Git Cheat Sheet coffee mug
- Use
CTRL+SHIFT+O
in Chrome’s Sources tab to navigate to your JavaScript function by name. - tabs AND spaces – A new podcast that talks the topics that developers care about. (tabsandspaces.io)