Designing Data-Intensive Applications – Data Models: Relationships

Subscribe: Apple Podcasts | Spotify | TuneIn | RSS

While we continue to dig into Designing Data-Intensive Applications, we take a step back to discuss data models and relationships as Michael covers all of his bases, Allen has a survey answer just for him, and Joe really didn’t get his tip from Reddit.

This episode’s full show notes can be found at https://www.codingblocks.net/episode124, in case you’re reading this via your podcast player, where you can be a part of the conversation.

Survey Says

News

Thank you for the awesome reviews:
- iTunes: Kampfirez, Ameise776, JozacAlanOutlaw, skmetzger, Napalm684, Dingus the First
Get your tickets now for NDC { London }, January 27th – 31st, where you can kick Allen in the shins where he will be giving his talk, Big Data Analytics in Near-Real-Time with Apache Kafka Streams. (ndc-london.com)
Hurry and sign up for the South Florida Software Developers Conference 2020, February 29th, where Joe will be giving his talk, Streaming Architectures by Example. This is a great opportunity for you to try to kick him in the shins. (fladotnet.com)
The CB guys will be at the 15th Annual Orlando Code Camp & Tech Conference, March 28th. Sign up for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com)

Relationships … It’s complicated

Normalization

Relational databases are typically normalized.
- A quick description of normalization would be associating meaningful data with a key and then relating data by keys rather than storing all of the data together.
Normalization reduces redundancy and improve data integrity.
Relational normalization has several benefits:
- Consistent styling and spelling for meaningful values.
- No ambiguity, even when text values are coincidentally the same, for example, Georgia the state vs Georgia the country.
- Updating meaningful values is easy since there is only one spot to change.
- Language localization support can be easier because you can associate different meaningful values with the same key for each supported language.
- Search for hierarchical relationships can be easier, for example, getting a list of cities for a particular state.
  - This can vary based on how the data is stored. See episode 28 and episode 29 for more detailed discussions related to some strategies.
There are legitimate reasons for having denormalized data in a relational database, like faster searches, although there might be better tools for the specific use case.

Relationships …

In Document Databases

Document databases struggle as relationships get more complicated.
Document database designers have to make careful decisions about where data will be stored.
A big benefit of document databases is locality, meaning all of the relevant data for an entity is stored in one spot.
- Fetching an order object is one simple get in a document database, while the relational database might end up being more than one query and will surely join multiple tables.

In Relational Databases

There are several benefits of relational database relationships, particularly Many-to-One and Many-to-Many relationships
- To illustrate a Many-to-One example, there are many parts associated to one particular computer.
- To illustrate a Many-to-Many example, a person can be associated to many computers and a computer can be associated to many people.
As your product matures, your database (typically) gets more complicated. The relational model holds up really well to these changes over time. The queries get more complicated as you add more relationships, but your flexibility remains.

Query Optimization

A query optimizer, a common part of popular RDBMSes, is responsible for deciding which parts of your written query to execute in which order and which indexes to use.
The query optimizer has a huge impact on performance and is a big part of the reason why proprietary RDBMSes like Oracle and SQL Server are so popular.
- Imagine if you, the developer, had to be smarter about the order that you joined your tables and the order of items in your WHERE clause …
  - and then ratios of data in the tables were different in production vs development,
  - and then a new index was added, …
The query optimizer uses advanced statistics about your data to make smart choices about how to execute your query.
A key insight into the relational model is that the query optimizer only has to be built once and everybody benefits from it.
In document databases, the developers and data model designers have to consider their designs and querying constantly.

How to choose Document vs Relational

Document Databases …

Better performance in some use cases because of locality.
Often scale very well because of the locality.
Are flexible in what they can store, often called “schemaless” or “schema on read”, but put another way, this is a lack of enforced integrity.
Have poor support for joining because you have to fetch the whole document for a simple lookup.
Require extra care when designing because it’s difficult to change the document formats after the fact and because there is no generic query optimizer available.

Relational Databases …

Can provide powerful relationships, particularly with highly connected data.
However, they don’t scale horizontally very well.

Resources We Like

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
Grokking the System Design Interview (Educative.io)
Generate metrics from your logs to view historical trends and track SLOs (Datadog)
Hierarchical Data – Adjacency Lists and Nested Set Models (episode 28)
Hierarchical Data cont’d – Path Enumeration and Closure Tables (episode 29)

Tip of the Week

Presto – The Distributed SQL Query Engine for Big Data. (prestodb.io)
Use the Files app in iOS to proxy files from Box or Google Drive (support.apple.com)
Pin tabs in Chrome for all of your must have open tabs. (support.google.com)
Use the Microsoft Authenticator to keep all of your one-time passwords in sync across all of your devices. And it requires you authenticate with it to even see the OTPs! (App Store, Google Play)
Combine Poker with learning with Varianto:25’s Git playing cards. (varianto25.com)
Search your Gmail for unread old emails with queries like before:2019/01/01 is:unread.
The new JetBrains Mono font is almost as awesome as the page that describes it. (JetBrains)

Share the joy

Designing Data-Intensive Applications – Data Models: Relationships

Sponsors

Survey Says

Which keyboard do you use?

News