Introduction
Find out how you can transform only new data that Fivetran replicates to your destination, rather than the entire dataset.
Recommendation
- Instead of using functions like `CREATE OR REPLACE TABLE` you can leverage an `INSERT INTO` statement paired with a `WHERE` clause directed at the `_fivetran_synced` column
- The `_fivetran_synced` column is a Fivetran added column that indicates the start time of the job that synced the corresponding row
An example of a transformation in Snowflake:
CREATE TABLE IF NOT EXISTS “SCHEMA”.”TRANSFORMED_TABLE” AS
SELECT
column_1,
_fivetran_synced
FROM “SCHEMA”.”TABLE”;
INSERT INTO “SCHEMA”.”TRANSFORMED_TABLE”
(column_1, _fivetran_synced)
SELECT
column_1,
_fivetran_synced
FROM “SCHEMA”.”TABLE”
WHERE “SCHEMA”.”TABLE”._fivetran_synced >= (
SELECT
MAX(_fivetran_synced)
FROM “SCHEMA”.”TRANSFORMED_TABLE”
);
Considerations
- This transformation method will keep all changes made to the data since the transformation started.
- Keep in mind the transformation frequency vs the sync frequency of data sources. For example, if your transformation frequency is every 4 hours and your sync frequency is every 2 hours, then you might miss records.
- When creating a Fivetran transformation, be sure to use the syntax of your corresponding destination.
- An alternative to this is creating or replacing a view that has benefits highlighted here . If you are set on creating a table, this is a good route to increase efficiency and only transform new data. Keep in mind that this transformation will still perform a full table scan.