Data Automation (CI/CD) with a Real Life Example

2024 ж. 28 Сәу.
7 600 Рет қаралды

One of the most fun aspects of being a data engineer is creating different automations.
And in particular, one area that's really important is CI/CD , which stands for Continuous Integration and Continuous Deployment.
This is where you can automate your testing and release strategy.
But I also understand that this concept can be a little vague or unclear if you haven't seen it in action.
So in today's video I'll show you a real-life example of how to use Github to make this happen.
This is just one example of why people really like code-based tools because of the ability to automate and do things like this.
This can be applied not only to your deployments but as we'll mostly cover in this video the idea of automating your data quality checks.
Thank you for watching!
►► The Starter Guide for The Modern Data Stack (Free PDF)
Simplify the “modern” data stack + better understand common tools & components → bit.ly/starter-mds
Timestamps:
0:00 - Intro
0:43 - Create Workflow File
1:18 - Review File Layout
2:55 - Use Pre-build Actions
4:06 - Trigger Workflow
Title & Tags:
Data Automation (CI/CD) with a Real Life Example
#kahandatasolutions #dataengineering #automation

Пікірлер
  • ►► The Starter Guide for The Modern Data Stack (Free PDF)→ bit.ly/starter-mds Simplify “modern” architectures + better understand common tools & components

    @KahanDataSolutions@KahanDataSolutions11 ай бұрын
  • I was literally going to research on ci/cd and I randomly enter youtube to watch a Dallas Mavs podcast and I see your video. Watched and Understood. Thank you

    @amazing-graceolutomilayo5041@amazing-graceolutomilayo504111 ай бұрын
  • Awesome Content, any chances you'd be able to do a video on DBT cloud CI? My team is using DBT cloud and we're definitely gonna be implementing the slim CI jobs.

    @Fajita_boi_swag@Fajita_boi_swag11 ай бұрын
  • We're in the process of setting much of this up. We want our end workflow to be something like * Clone PROD database (Using Snowflake ZCC) * dbt build against that Clone * Perform a data-diff between PROD and Clone * Report results of the dbt build AND any data-diff variations for review * Bring down the Clone We have a bit of background work to do until we're ready for that, so for now we just do a `dbt compile` step - it doesn't catch everything but does at least catch simple syntax issues, or stuff like invalid docs syntax etc

    @aldredd@aldredd11 ай бұрын
  • Thank you!

    @LandonColvig@LandonColvig2 ай бұрын
  • Great video! Just been wanting to learn ci/cd with dbt but how does it test functionality of dbt models, doesn't it need actual data on which dbt models operate? I thought GitHub actions run on some ephemeral machine for GitHub ci/cd (which doesn't have access to my data), or I'm wrong?

    @melnikovjnr@melnikovjnr11 ай бұрын
    • Great question. I purposely avoided getting too far into the dbt-specific requirements b/c it can get a little confusing but since you asked.. here is how it works: First: When talking about dbt specifically, it's important to remember that it's effectively just compiling your code and sending it to your database to be run. Here's how it is able to still connect through a triggered workflow every time, even from a virtual runner on Github: 1. Configure a new profiles.yml in your project's root directory (can be elsewhere, but be sure to align with step 4) 2. Set username/password credentials in profiles.yml to use environment (ENV) variables 3. Set the ENV variables during the workflow by passing in SECRETS (so it's secure - this is set in the Github UI outside of this video) 4. In your dbt command, set the --PROFILES-DIR=. flag so that it uses this root profiles.yml file instead of the default 5. Now, when the CI/CD workflow runs, it spins up a hosted runner, the profiles.yml values are passed the credentials (secret > env var > profiles) and it can then successfully connect to your DB and run the dbt queries against your data & do everything as normal. This was real confusing for me at first too but hopefully that clears it up a bit. I have a few other videos on using Secrets & Creating Actions in my GitHub playlist if you want to see more!

      @KahanDataSolutions@KahanDataSolutions11 ай бұрын
    • I see, gotta try it. Thanks for detailed answer!

      @melnikovjnr@melnikovjnr11 ай бұрын
    • @@KahanDataSolutions Awesome, I have another question related, how should be the deployment for other environments? It can follow the gitflow pattern?

      @jeancarloflorescarrasco412@jeancarloflorescarrasco41211 ай бұрын
  • i guess this all a bit too advanced for me: where are you doing all these changes in the beginning? VS Code? I cannot see the full window so difficult to follow

    @eugenmalatov5470@eugenmalatov547011 ай бұрын
  • This dude has really nice hair

    @opethmike@opethmikeАй бұрын
KZhead