Modern Data Engineering Workflows, Explained

2024 ж. 14 Мам.
4 379 Рет қаралды

Modern data engineering isn't all about tools & technologies.
One area that's often overlooked is the concept of "workflows".
In particular, data team workflows for continuously building projects.
This includes everything from environments, naming conventions, automation and more.
In this video, you will:
- Learn high level design of common team workflows
- See an example implementation
- Be able to identify whether or not you're following this yourself
Thank you for watching!
►► The Starter Guide for The Modern Data Stack (Free PDF)
Simplify the “modern” data stack + better understand common tools & components → bit.ly/starter-mds
Timestamps:
0:00 Intro
0:21 Why It's Important
1:33 Design & Process Review
4:39 Database Example (Snowflake)
Title & Tags:
Modern Data Engineering Workflows, Explained
#kahandatasolutions #dataengineering #datapipeline

Пікірлер
  • ►► The Starter Guide for Modern Data → bit.ly/starter-mds Simplify “modern” architectures + better understand common tools & components

    @KahanDataSolutions@KahanDataSolutions5 ай бұрын
  • Thanks for this Kahan. Please make a video implementing the workflow like you've done with the CI/CD. Thanks again.

    @jacobukokobili6457@jacobukokobili64575 ай бұрын
  • Hi Kahan, a question I have after watching many of your videos. What about a client's situation makes you think one tool would fit better than another? For example Snowflake vs BigQuery.

    @NicoWright-ly6en@NicoWright-ly6enАй бұрын
  • A lot of good ideas from your videos has inspired me to improve my development flow.

    @marcosoliveira8731@marcosoliveira87315 ай бұрын
  • Its a very clear explanation

    @felipecondore4173@felipecondore41735 ай бұрын
  • I love it. Already doing but it's a good reminder

    @goosetaculous@goosetaculous5 ай бұрын
  • Very clear and concise, thank you

    @DATA_RUNNER@DATA_RUNNER5 ай бұрын
    • Glad it was helpful!

      @KahanDataSolutions@KahanDataSolutions5 ай бұрын
  • In our setup we have multiple environments (DEV, QA, PROD), all seperate including the raw sources including the ETL. This doubles our costs at least. The setup that you showed eliminates the extra costs for processing and storage by using one environment, right? How do you deal with upgrades and changes in the raw datasource layer? For example a source system that has significant changes in its database schema after an upgrade? Just add another schema in the raw database?

    @MrUbbers@MrUbbers5 ай бұрын
  • Would you need separate dev schemas for the staging and marts? Let's say I want to develop a new mart. Would I put all of those models in the same dev schema before going to production?

    @EMBrown801@EMBrown8015 ай бұрын
    • I typically will do that. I like to keep all tables/views in a single Dev schema (ex. all Staging, Warehouse, Marts) to avoid excessive objects and keep it simple. The way I see it, nobody else is really looking at that schema so perfect separation & organization isn't as important. What's more important is that you can confirm models deploy, check the data, etc. Then once you move to "production", separate things out by specific schemas. Hope that helps!

      @KahanDataSolutions@KahanDataSolutions5 ай бұрын
KZhead