Orchestrate custom data transformations in your destination with Transformations for dbt Core*.
NOTE: Contact Fivetran Support to enable Transformations for dbt Core - Scheduled in Fivetran for your account.
Fivetran integrates with dbt Core to power our transformations. dbt Core, by dbt Labs, is an open-source transformation tool that enables you to perform sophisticated data transformations in your destination using simple SQL statements. With dbt Core, you can:
- Write and test SQL transformations
- Use version control with your transformations
- Create and share documentation about your transformations
Once you have set up dbt Core, you write SQL SELECT statements (a.k.a “dbt models”) in a Git repository to transform your data. dbt Core runs these SQL statements in your destination to build tables and views. dbt Core honors dependencies between your dbt models so everything is built in the correct order.
To work with dbt Core, you can either use the dbt CLI, a free and open-source command line interface, or dbt Cloud, dbt Labs’ hosted service. Fivetran can run dbt projects created with either dbt Cloud or dbt CLI.
There are two types of Transformations for dbt Core:
- Scheduled in Fivetran (recommended): We run your dbt models in your destination according to the schedule that you set in the Fivetran dashboard.
- Scheduled in Code: We run your dbt models in your destination according to the schedule that you set in your dbt project.
Scheduled in Fivetranlink
Fivetran connects to your Git provider and runs your dbt models in your destination according to the schedule that you choose in the Fivetran dashboard. We sync your dbt models from your Git provider every few minutes to ensure that we are up to date.
You create a transformation in the Fivetran dashboard for each dbt model that you want Fivetran to run. Each transformation consists of the following elements:
- Output model: A dbt model that transforms your data so it’s ready for analytics.
- Output model lineage: All upstream models that are needed to produce the output model, starting from your source table references in dbt Core.
- Schedule: A customizable schedule that determines how often Fivetran runs your transformation.
IMPORTANT: Each transformation references a single output model but executes all upstream models during each run.
By default, new transformations have the same schedule as their associated connectors, which means that Fivetan automatically runs your transformations as soon as we update your destination data. These integrated schedules reduce data latency and ensure that your analytics tools reflect new data as quickly as possible. Integrated schedules can also reduce compute costs, since downstream transformations do not run if their associated connector fails to sync. Learn more in the integrated scheduling section.
Learn how to manage transformations in your Fivetran dashboard.
TIP: If you want to customize your transformation schedule, we recommend that you schedule transformation runs in your Fivetran dashboard. However, you can use a configuration file in your Git repository instead if you prefer.
To run a transformation with integrated scheduling, Fivetran performs the following steps:
- Compile your dbt project and inspect the automatically-generated manifest file to build a complete data lineage graph for your dbt models.
- Match source table references in the dbt models to the source table names written by Fivetran connectors.
- Unify your pipelines into end-to-end directed graphs.
- Execute the pipelines in order, which minimizes latency on the analytics-ready tables in your destination.
Fivetran pipelines use the following elements:
- The start is the interval that initiates the pipeline.
- A connector updates source tables in the destination.
- A junction waits for multiple connectors to finish syncing before it triggers a dbt transformation.
- A transformation is a model or a collection of models that updates downstream tables in the destination.
- An output model generates an analytics-ready table. It is typically a leaf node on your data lineage graph.
- A test is an assertion that you make about the models in your dbt project. A test may succeed or fail independently of model execution.
You may prefer not to run some transformations that are logically downstream of the start node.
For example, if the
churn calculation is very expensive, you may want to run it hourly instead of every 15 minutes with the
oracle connector and the
customers model. In this case, you can create a new schedule for the
churn model, which introduces a separate start node. Whenever an output model is executed in Fivetran, all upstream models are rebuilt as part of the transformation.
While you can set downstream models on varied schedules, you can only integrate downstream models with connectors when their schedules match. In the example below, all connectors run every 15 minutes. The
customers model runs every 15 minutes and is therefore integrated with upstream connectors, but the
revenue model runs every hour and the
churn model runs once every 24 hours.
Fivetran comes with a fixed set of start nodes corresponding to different sync frequencies. When you select a frequency in the dashboard, the pipelines that activate those syncs are aware of overlaps and automatically adjust to them. In the example below, the
oracle connector is on a 15-minute schedule, the
netsuite connector is on an hourly schedule, and the
salesforce connector is on a 24-hour schedule.
- The 15 minute node activates every 15 minutes, except when the 1 hour or 24 hour node activates.
- The 1 hour node activates every hour, except when the 24 hour node activates.
- The 24 hour node activates all three connector syncs.
Data lineage graphs BETA
Data lineage graphs (DLGs) visualize your data pipeline end-to-end. DLGs show the dependencies between your dbt models so that you can track the flow of data from your connectors to your destination. DLGs also display the run status for each connector and model in your pipeline, which you can use to troubleshoot failed transformation runs.
Each DLG consists of the following elements (also called “nodes”):
- Connector: The connector that syncs data from your source. You may have more than one connector per output model.
- Source table: The tables in your data source that your dbt models reference.
- Intermediate model: All intermediate models that are needed to produce the output model.
- Output model: A dbt model that transforms your data so it’s ready for analytics.
Learn how to view DLGs in your Fivetran dashboard.
- You cannot manually trigger a transformation run. If you want to change when a transformation runs, you must edit its schedule.
- You cannot cancel a transformation run. If you want a transformation to stop running, you must delete it.
- DLGs do not show node-level run status in real time. You must wait until the transformation run finishes to see the run status for each transformation.
Scheduled in Codelink
Fivetran connects to your Git provider and runs your dbt models in your destination according to the schedule that you set in your dbt project’s
deployment.yml file. We sync your dbt models from your Git provider every few minutes to ensure that we are up to date.
To run a transformation with Scheduled in Code, Fivetran performs the following steps:
deployment.ymlfile in your dbt project.
deployment.ymlfile to create new jobs or update existing jobs.
Run the jobs according to the schedule you specified in your
deployment.ymlfile. For each job run, we do the following:
i. Prepare an environment with the corresponding dbt CLI version installed, the clean project working directory, and the
ii. Execute the
dbt depsservice command to install the required project packages.
iii. Execute the steps in each job one-by-one until all scheduled jobs have run.
- You cannot integrate a transformation’s schedule with a connector schedule.
- You cannot add, edit, or delete a transformation or its schedule on the Fivetran dashboard. You must do so in your dbt project.
- You cannot view a data lineage graph for your transformation in the Fivetran dashboard.
Fivetran supports Transformations for dbt Core for the following destinations:
Fivetran data modelslink
IMPORTANT: To use Fivetran’s data models, you must have a BigQuery, Redshift, or Snowflake destination.
To learn how to use Transformations for dbt Core, follow the setup guide that applies to you:
- To schedule transformations in the Fivetran dashboard, follow the Scheduled in Fivetran setup guide.
- To schedule transformations in your dbt code, follow the Scheduled in Code setup guide.
To see common use cases for Transformations for dbt Core - Scheduled in Fivetran, see our Use Cases documentation.
* dbt Core is a trademark of dbt Labs, Inc. All rights therein are reserved to dbt Labs, Inc. Fivetran Transformations is not a product or service of or endorsed by dbt Labs, Inc.