Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)

2024 ж. 14 Мам.

18 416 Рет қаралды

Without a solid data modeling approach, most architectures fall apart.
And in a previous video I broke down five of the most common data modeling strategies you'll see in a modern stack.
But today we'll look at little deeper into three of the most common examples which include:
1. Normalized
2. Star schema
3. Data Vault
But rather than going deep into code, instead we'll review some images (that I found in a simple Google search) of what some example table/relationship designs can look like in each approach.
My hope is by giving this level of example will help get the point across a little better and clear up any gaps that you may be having after watching the last video.
Thank you for watching!
►► The Starter Guide for The Modern Data Stack (Free PDF)
Simplify the “modern” data stack + better understand common tools & components → bit.ly/starter-mds
Timestamps:
0:00 - Intro
1:02 - Normalized (Inmon)
1:37 - Star Schema (Kimball)
2:37 - Data Vault
Title & Tags:
Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)
#kahandatasolutions #dataengineering #datamodeling

Пікірлер

►► The Starter Guide for The Modern Data Stack (Free PDF)→ bit.ly/starter-mds Simplify “modern” architectures + better understand common tools & components
@KahanDataSolutions11 ай бұрын
You had me at star schema. Thanks, a nice and short introduction.
@Milhouse77BS11 ай бұрын
@KahanDataSolutions - great videos, thank you for them. Question per Link tables, are they natively handled by graph databases? Link tables in a data-vault seem like something a graph database would naturally be designed for, no? and further, one could relatively easily conceptualize mapping document based structures in a doc db to vector based structures in a vector/graph db, no?
@user-ut9fb6cl5q8 ай бұрын
Hope you make more videos on data vault topic covering with good examples
@IronmanMahesh11 ай бұрын
Nice video! I have a a question: Is it necessary to use FK in the fact tables of dimensional or denormalized models? I saw this question in many forums on the web and many say "YES", arguing that this allows the data models to not load primary keys into the fact tables that are not in the dimension tables, but others say "NO ", arguing that if the OLTP data model is well built (or should be), the validation of the transactional model itself (non-null PK) should be sufficient and the tables of the dimensional model can be focused on the speed of loading the Data Warehouse. without the need to put primary/foreign keys in the tables, just paying attention to defining default values in each dimension (Dummy value) so that the fact table takes it. The only reason I really use primary keys, is for my dimension tables where I have business keys and surrogate keys and have to deal with SCD1 /SCD2. Is it necessary to add said FK restriction to those dimensions in the fact? Greetings,
@Mirminman11 ай бұрын
- no need, especially maybe you know maybe not - snowflake for example doesn't even have a enabled FK constraints (only disabled)
  @ivani32374 ай бұрын
the characteristic of the data vault model is not clear to me, would these elements be each in a separate database? like what is distinguishing it from a normalized db
@luvincste11 ай бұрын
What's the "previous video" mentioned? I see the one specifically about the Kimball method, but not the others.
@saltrocklamp1994 ай бұрын
- kzhead.info/sun/fMh8nbGjiH6senk/bejne.htmlsi=3YjxTwRk7BQlwvi5
  @KahanDataSolutions4 ай бұрын
I like the data vault example, you explained it well. But I am confused about the Normalized example as I don't see how each of these tables can relate to each other, e.g. Order to OrderItem table, no key to link them. Overall it's a good introduction
@rachellougee668711 ай бұрын
- Thanks Rachel! In hindsight, I probably could have found a better example for normalized. But in the real world, each table would (or at leas should) have a unique primary key. Therefore Order would have an OrderId column, OrderItem would have OrderItemId. However, if it does not have one in the source system, you could create what's called a surrogate key and essentially generate a unique column based on all of the other columns in the row. In terms of relating to each other - best case scenario is there are ids in the tables. For example, OrderItem table has a column for OrderId that you can join the two together. However if that doesn't exist, you'll have to work with the business to understand what ideally should be matched together so that you can join the tables together accordingly. As an example, maybe you find you need to join on something like Product and Location and Date. This can get tricky, but that's the lovely game of data modeling.
  @KahanDataSolutions11 ай бұрын
- @@KahanDataSolutions thanks Michael. I think I got sidetracked by that specific example. What you just said makes sense. Thank you for the clarification.
  @rachellougee668711 ай бұрын
- @@rachellougee6687 You bet, thanks for watching and for the comment!
  @KahanDataSolutions11 ай бұрын
I wish there was a more interactive way to learn/practice modeling other than read 10 books and build out 2 models per year at work
@andynelson234011 ай бұрын
- Haha that's too real. The next best option IMO is to create your own mini-version to get the basics. For example, I remember when I was first learning, I created my own local SQL Server database w/ diff tables based on my favorite hip hop artists. I created tables for their albums, hometowns, names, etc. Tried to keep it separate as if they were "data source tables". Probably only about 5 or so tables with less than 20 records in each. Literally wrote insert scripts with hardcoded values. But it was enough to play around w/ concepts, form relationships and get the gist. Plus it doubled to help me learn how to actually create tables, add primary/foreign keys/etc. Oftentimes you can learn 80-90% of what you need w/ just a small dataset like that vs trying to mimic an enterprise volume - which is what I think most of us think we need and get hung up on. But it's the edge cases & optimizations that you need the work experience for, and cant get in a book. Plus every company has their own nuances. Anyways - that was long winded but hope that was helpful or inspires somebody with new ideas
  @KahanDataSolutions11 ай бұрын
Normalized can also be identified as snowflake schema, no?
@walterwine11 ай бұрын
- not really. Snowflake is an extension of the star schema, with a fact table in the center but the dimensions tables around can have ramifications. A simple example to illustrate this : (FACT_table)
  @ecozturk11 ай бұрын