3 Ways to Deploy Data Projects

2024 ж. 14 Мам.

4 168 Рет қаралды

When working on a data team, it's one thing to build the code, it's another to plan and strategize your deployment approach.
And over the years I've worked on a handful of different teams where the strategies have been different, whether it's because of personal preference or the technology chosen.
So in this video, I've compiled three of the most common approaches that I've seen to help you understand common strategies teams are taking.
This video will be based off of a dbt (data-build-tool) project, which is a data transformation tool.
But these approaches and concepts can work regardless of tool selection.
Thank you for watching!
►► The Starter Guide for The Modern Data Stack (Free PDF)
Simplify the “modern” data stack + better understand common tools & components → bit.ly/starter-mds
Timestamps:
0:00 - Intro
0:48 - Separate Source Data
1:14 - All-in-one
2:23 - Isolated Development
Title & Tags:
Data Deployment Strategies (3 approaches)
#kahandatasolutions #dataengineering #databases

Пікірлер

►► The Starter Guide for Modern Data → bit.ly/starter-mds Simplify “modern” architectures + better understand common tools & components
@KahanDataSolutions Жыл бұрын
It was helpful, thank you!
@sujaa1000 Жыл бұрын
- Glad to hear it! Thanks for watching
  @KahanDataSolutions Жыл бұрын
I do approach 2 if the client doesn't have requirements for separate databases or even separate snowflake accounts which happens when IT is run by the old school
@YEM_ Жыл бұрын
We went with 2 databases; PROD & DEV. PROD contains everything, using Schemas to separate Raw / Modelling and Analytics outputs (Analytics being purely 1:1 views with 'final' tables from Modelling Schema. We then leverage Zero Copy Clones in Snowflake to Replicate the above Schemas back to DEV, and all development (inc user dbt schemas) happens in that separate database, but the user knows they're still leveraging PROD data. We also have a QA, which whilst not yet fully embedded into the process, will be used to test dry-run dbt runs prior to merging. It'll be structurally identical to PROD, but with only a few rows per table - just so we can verify a full dbt run without performing a full dbt run ... we've had a few issues come up in the past that a dbt compile alone didn't catch.
@aldredd Жыл бұрын
Hi Michael, are the analytics data marts built off of the analytics staging schema? The way I see it is that your raw database has a schema per source system, each schema here is in the same structure as the source system (3nf). From there, your staging schema in the analytics database makes up your enterprise data warehouse model (Inmon, Kimball, Data Vault) by querying the data from the raw database. Finally, the data marts schema is used to store subsets of the data in the staging schema, either for specific departments or reporting projects. Is my understanding correct here?
@joshellis4966 Жыл бұрын
I prefer having one database and schemas as layers, I like the medallion architecture
@Papa91echo Жыл бұрын