Data Modeling in the Modern Data Stack

2024 ж. 14 Мам.
85 783 Рет қаралды

Data Modeling is arguably the most impactful decision for a data team.
It determines your architecture and the path that the whole team will follow.
While this is not a new topic, the new tools and tech over the last decade has caused many to reconsider what's best in a modern landscape.
So in today's video I want to break down this topic.
We'll discuss:
1. Why is Data Modeling (still) important?
2. What are the common approaches?
3. What things should you consider?
►► The Starter Guide for Modern Data → bit.ly/starter-mds
Simplify “modern” architectures + better understand common tools & components
Timestamps:
0:00 - Intro
0:44 - Why is Data Modeling Important?
2:33 - Common Approaches
7:39 - Things to Consider
Title & Tags:
Data Modeling in the Modern Data Stack
#kahandatasolutions #dataengineering #datamodeling

Пікірлер
  • ►► The Starter Guide for Modern Data → bit.ly/starter-mds Simplify “modern” architectures + better understand common tools & components

    @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • just started out my career watch the videos 4 months back gave me a high level overview ------------------- came back after a session with my seniors I see the granula details thanks

    @nlishivha1005@nlishivha100514 сағат бұрын
  • Great summary, thank you! Personally, I'm more comfortable with the Kimball approach, but the "modern data stack" part in the title drove catched my attention, and I think it's important to know what trends align better withe the modern tech. Also, I couldn't agree more than design is a key part of the process, and if done correctly is going to prevent a lot of headaches. It would be nice to get deeper into the hybrid model you presented.

    @nlopedebarrios@nlopedebarrios10 ай бұрын
  • Just because storage is cheap doesn't mean it's unimportant. I/O is the often slowest part of OLAP and ETL queries and directly dictates compute cost. The two current OLAP pricing models (pay by minute or pay by bytes "scanned") both depend on how much data you load per query. The former because you pay for an idle cluster that is waiting for data, the latter because it literally bills you for how much data you load. Data modeling helps you get the compression and partitioning you need to control and minimize that cost.

    @firefoxmetzger9063@firefoxmetzger90638 ай бұрын
  • Hey Mike, this video was straight 🔥🔥🔥 - This is such an important topic for all businesses taking the journey right now and it really blows my mind how much understanding of modelling has been lost. Your overview of the pros and cons of each was a well balanced and to-the-point summary - I especially love that you correctly call out the source data and scale risks of OBT. I think too many peeps in the MDS are working for SaaS companies where they don't realise that 1) source systems are way less static in enterprise and 2) most enterprises outside of SaaS aren't just okay with massive pipeline code spaghettis. Great video!

    @johncosgrove7727@johncosgrove7727 Жыл бұрын
    • Really appreciate the comment John!

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • Good one. I think taking an actual data and flowing through model would be great.

    @theconfusedchannel6365@theconfusedchannel6365 Жыл бұрын
  • Wow this is spot on. I've been using the modern data stack for the last ~8 years or so. Starting out, I was definitely focusing on a star schema/denormalized approach with the MPP databases but as I started to learn they do best with fewer joins and can handle wide tables I strive for the OBT approach. In practice, the hybrid is what typically happens, there are so many dimensional tables which very in need from team to team so especially in a larger enterprise, the hybrid is almost a guaranteed. Great video! Love the work you did here.

    @Joeymbryan@Joeymbryan7 ай бұрын
    • Appreciate the feedback! What you describe is really similar to my journey as well.

      @KahanDataSolutions@KahanDataSolutions7 ай бұрын
  • Addresses the challenges and thoughts to be taken when going into the cloud. But not enough details on modelling itself from the Data.

    @shreshti82@shreshti82 Жыл бұрын
  • Awesome video , really enjoyed very clear and straight to the point

    @ArmstrongNigere@ArmstrongNigere Жыл бұрын
    • Appreciate it! Thanks for watching

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • thank you, incredibly helpful.

    @zulkhaireesulaiman8575@zulkhaireesulaiman8575 Жыл бұрын
    • Glad it helped!

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • Hi Michael, interesting approach the hybrid model. What could be used to transition from the star schema to the OBT data marts? for example, views, materialized views? or are they separate schemas? Also, in what scenario this would make sense?

    @nlopedebarrios@nlopedebarrios8 ай бұрын
  • Great overview! Thanks for educating us on the newer approaches. We currently are using Denormalized Modeling. This is due to the fact that all our current needs are around our ERP. The Marketing Dept wants to start collecting more website and estore analytics, which I believe will lead us to a Hybrid model.

    @wingnut29@wingnut29 Жыл бұрын
    • Glad it was helpful! I still think denormalized is a great strategy.

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • Good one Michael. Well summarized and to the point. Even I think Hybrid approach is the best for the MDS.

    @mrcool4uall@mrcool4uall7 ай бұрын
  • Oh man, it's so nice to get something substantial on KZhead for data modeling! Awesome awesome stuff. What do you think of unistore? I have yet to dive in, and I'm also very fresh into the industry, but the prospect of combining oltp and olap capabilities is certainly compelling!

    @davemeech@davemeech6 ай бұрын
  • great stuff - when are you going on Joe Reis and Matt Housley's podcast?

    @Rex_793@Rex_793 Жыл бұрын
  • Solid overview! I’ve mainly used dimensional modeling with fact/dim tables. A good data model goes a long way within the analytics pipeline, important stuff. Thanks

    @nicky_rads@nicky_rads Жыл бұрын
    • Definitely. Appreciate you sharing your experience

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
    • I second that. As a original DBA, proper DB modeling (either in OLTP or OLAP) will provide peace of mind in the future.

      @datasleek7950@datasleek7950 Жыл бұрын
    • Get the data model right and the rest falls into place IMO 👍

      @summer_xo@summer_xo Жыл бұрын
  • Great Job. Great Presentation.

    @datasleek7950@datasleek7950 Жыл бұрын
    • Thank you!

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • Thanks for addressing this often overlooked but important topic! I'm looking for some good sources on dimensional data modeling. Of course I have the Kimball books, but something more practical (books, courses...) and hands-on, perhaps with exercises on various business scenarios / sources would be great. Any ideas?

    @johnytheripper@johnytheripper Жыл бұрын
    • Any luck?

      @deltagamma1442@deltagamma1442 Жыл бұрын
    • @@deltagamma1442 not really :/. Took some inspiration from Gitlab data handbook as I'm mostly looking for SaaS use cases

      @johnytheripper@johnytheripper Жыл бұрын
  • In some cases, we observe another pattern whether industry standard data model is used after raw layer.

    @amitsaha7756@amitsaha7756 Жыл бұрын
  • great video. thank you. can you upload a vlog on Speech to Text transcripts using AI

    @Lima3578user@Lima3578user7 ай бұрын
  • Could you please explain the differences between different data models(Inmon,Kimball,3NF,Dimension Modelling,Data Vault).

    @user-dx2dg4bd7q@user-dx2dg4bd7q11 ай бұрын
  • hello, really nice video about data modeling. I was looking for this. It's very difficult to define a right approach for data modeling, each case is a case, in my experience I did a lot of star schema during my carrer, but in nows day I see a trend to one big table in modern data warehouse like bigquery, redshift or synapse. Do you have the same impression?

    @rpelegrini@rpelegrini Жыл бұрын
    • Yep I'm seeing the same thing. Truly case by case. I think the concept of star schema/data models are still very applicable today mainly b/c of the organization and structure it brings rather than for any performance gains.

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • hey, videos was top tier, can you suggest a any good course to get the in depth understandin of the DM

    @sunil-de@sunil-de6 ай бұрын
  • I literally hit this video so fast sincerely thinking I was going to learn "How to date a model." My brain went faster than my eyes and lost the game.

    @bradleymiller437@bradleymiller437 Жыл бұрын
  • Next: How to date a model!

    @okj4521@okj45218 ай бұрын
  • merci beaucoup a toi :))

    @medhatatef7737@medhatatef7737 Жыл бұрын
    • de rien!

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
  • hey ur explanation really calm and clear ! did you have any udemy course?

    @renvils@renvilsАй бұрын
  • How do Data Products in their various guises fit with these data modelling concepts.

    @S_B_S1@S_B_S127 күн бұрын
  • What are the downsides of doing all three in one? Pull all source systems raw data (inmon) then modeling fact and dim tables (kimball) then making data marts? If storage is getting cheaper wouldn’t this be the best way?

    @theukulelegod@theukulelegod2 ай бұрын
  • *slow clap* thank you, fantastic.

    @jimgillespie3540@jimgillespie3540 Жыл бұрын
  • Click bait. He did not say one word about how to date a model.

    @severn_creek2374@severn_creek2374 Жыл бұрын
    • Busted

      @KahanDataSolutions@KahanDataSolutions Жыл бұрын
    • I’m dead

      @JB-ve8sk@JB-ve8sk8 ай бұрын
  • How to create a LDm in Magic draw

    @user-wf7ni3ml6u@user-wf7ni3ml6u10 ай бұрын
  • just theoretical bla bla in my opinion.. great for COO's to talk about stuff they have no idea about

    @largpack@largpack2 ай бұрын
    • Until you are being grilled about these in an interview with companies like airbnb, Netflix and Facebook and you look like a complete clueless idiot and you get shown the door

      @moonfire5069@moonfire5069Ай бұрын
KZhead