Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Spotify | TuneIn | RSS
It’s that time again.
This week we answer a question, Allen registers for school, Joe reads some numbers, Michael breaks out the survey results, and Joe cringes at the thought of bidets. It’s time for episode 29! And we thought, what better to talk about than to continue our discussion on hierarchical data solutions.
We were so excited to discuss the survey results, that we forgot to mention a new survey. So, this episode’s survey is:
News & Updates
- Sincere thanks for the reviews: ShonnLy, Mike Myke, and DEADIRON
- The survey results are in, but what do they mean? Science!
- It’s never too late to become a developer.
- Join Allen at Stanford’s Algorithms: Design and Analysis, Part 1 online course! He’ll need a shoulder to cry on.
- NDepend 6 is out! Use offer code NDependV6CodingBlocks before July 2nd to save 20%.
Path Enumeration
Also known as materialized path, in this structure, one column contains each row’s entire heritage. Ways to visualize this include any directory path you’ve ever seen like C:\Windows\System32\drivers\etc or any bread crumbing you’ve see around the web such as Home > Computer Hardware > Hard Drives > SSDs.
Consider three records with ids 1, 2, 3. Record 2 is a child of 1 and record 3 is a child of 2. Assuming the / is used to separate each id, the data would look like:
ID | PATH |
1 | 1/ |
2 | 1/2/ |
3 | 1/2/3/ |
Pros
- Each record knows it’s entire heritage without additional queries.
- Allows multiple parents, but requires being stored in a separate table.
Cons
- Depth of heritage constrained to column width.
- Queries to get descendants require string parsing, which can be a performance hinderance, especially for larger data sets.
- SELECT * FROM mydata WHERE path LIKE ‘windows\system32\%’
- No referential integrity nor normalization.
- Hierarchy changes can be expensive.
- Doesn’t scale well for larger data sets.
Closure Tables
Similar to the nested set model, closure tables are a separate table like the name implies. Every node of the tree knows *all* of it’s descendants.
Considering the same three records as before, the data would look like:
Ancestor | Descendant |
1 | 1 |
1 | 2 |
1 | 3 |
2 | 2 |
2 | 3 |
3 | 3 |
Pros
- Supports referential integrity.
- Indexes well.
- Easy to manipulate “middle” nodes: add/move/delete.
- Easy query sub-trees.
- Allows multiple parents.
- Multiple hierarchies can co-exist.
Cons
- Total row count can be high for large data sets. Worse case, O(n2).
Tips & Tricks
- Launch a browser in Incognito Mode directly from Visual Studio
- Need lots of data to play with? Amazon has you covered with Public Data Sets.
- Be nice to your co-workers and provide smaller merge bubbles. Use git pull‘s rebase option. WARNING: Not recommended for history that has already been published.
Tell A Friend. Or Three.
And if you haven’t already, be sure to visit www.codingblocks.net/review to share your opinion with anonymous strangers. Meh, it’s what people do. And if you already have, who’s better than you? No one, that’s who. You. Rock.
Code, compile, commit. Sigh, repeat.