Destination Improvement: Snowflake Improvement to Data Loading
AnsweredIt seems FIVETRAN has a data loading strategy that involves creation of staging tables (in some cases temporary) in the FIVETRAN staging schemas and then drop them each run. These operations are unnecessary as the schema changes on the objects do not happen so frequently and instead of temp staging tables the use of permanent staging tables would make sense to reduce the cloud service overhead around such operations.
e.g.
CREATE TEMPORARY TABLE "FIVETRAN_FROCK_NINJA_STAGING"."NETSUITE_ACCOUNTING-STAGING-35CEAFBE-59F9-4FD7-AAD9-E2B66215BD8B" ("ACCOUNTING_PERIOD_ID" DOUBLE PRECISION , "FIVETRAN_INDEX" VARCHAR(256) , "CLOSED" VARCHAR(12) , "CLOSED_ACCOUNTS_PAYABLE" VARCHAR(12) , "CLOSED_ACCOUNTS_RECEIVABLE" VARCHAR(12) , "CLOSED_ALL" VARCHAR(12) , "CLOSED_ON" TIMESTAMP_TZ , "CLOSED_PAYROLL" VARCHAR(12) , "DATE_LAST_MODIFIED" TIMESTAMP_TZ , "ENDING" TIMESTAMP_TZ , "FISCAL_CALENDAR_ID" DOUBLE PRECISION , "FULL_NAME" VARCHAR(2388) , "ISINACTIVE" VARCHAR(12) , "IS_ADJUSTMENT" VARCHAR(12) , "LOCKED_ACCOUNTS_PAYABLE" VARCHAR(12) , "LOCKED_ACCOUNTS_RECEIVABLE" VARCHAR(12) , "LOCKED_ALL" VARCHAR(12) , "LOCKED_PAYROLL" VARCHAR(12) , "NAME" VARCHAR(256) , "PARENT_ID" DOUBLE PRECISION , "QUARTER" VARCHAR(12) , "STARTING" TIMESTAMP_TZ , "YEAR_0" VARCHAR(12) , "YEAR_ID" DOUBLE PRECISION , "_FIVETRAN_DELETED" BOOLEAN , "DATE_DELETED" TIMESTAMP_TZ , "_FIVETRAN_SYNCED" TIMESTAMP_TZ )
CREATE TEMPORARY TABLE "FIVETRAN_FROCK_NINJA_STAGING"."SALESFORCE_BOOKING_-STAGING-E9C9A318-8BDC-447E-99C0-3EACB10A8C59" ("ID" VARCHAR(18) , "IS_DELETED" BOOLEAN , "PARENT_ID" VARCHAR(18) , "CREATED_BY_ID" VARCHAR(18) , "CREATED_DATE" TIMESTAMP_TZ , "FIELD" VARCHAR(765) , "OLD_VALUE" VARCHAR(765) , "NEW_VALUE" VARCHAR(765) , "_FIVETRAN_SYNCED" TIMESTAMP_TZ )
DROP TABLE "FIVETRAN_FROCK_NINJA_STAGING"."NETSUITE_SYSTEM_NOT-STAGING-9724D356-6F44-4BAB-BB1D-7A8356406683"
DROP TABLE "FIVETRAN_FROCK_NINJA_STAGING"."SALESFORCE_CONTACT_-STAGING-E3CE4646-492B-4407-A877-833B8AC574DB"
The proposal is the following:
- Create persistent tables in the staging schema avoiding creation and drops each load. Each core table should have a corresponding stage table.
- Truncate load the stage table each time before final merge operation to core tables in the relevant schema
- Anytime the core table schema changes (based on identified changes at source) run similar DDL operations on the corresponding stage table
-
Official comment
Hi Shibin,
Thanks for the feedback! Snowflake CSL costs have certainly come up before, and we understand your desire to minimize these costs as much as possible.
One potential concern of relying on persistent tables in the destination is that if they were to change (possibly by a user for that warehouse), it could severely impact our replication processes. Also, there are significant processing steps that occur in Fivetran's servers before we create these staging tables you are referring to.
I hear your pain point and need to minimize CSL costs, and Snowflake optimizations are always on our radar. Thank for you for this feedback, I will be sure to discuss this with our teams. Please let me know if you have any questions or thoughts!
Please sign in to leave a comment.
Comments
1 comment