Destination Improvement: Snowflake: Make schema changes on large tables in batches
AnsweredWe have a large table in Snowflake that's around 700 million rows that loads from a parquet file. Occasionally there's a schema change, and Fivetran tries to update the data type on the table.
- Create a temporary column with the updated data type
- Update the temporary column with the type-cast values from the old column
- Drop the old column
- Rename the temporary column to the correct name
The problem is that our regular Fivetran warehouse isn't large enough to update all 700 million rows at once. We have fixed this by manually running the above process, but for step 2 we update in batches - usually based on _fivetran_synced.
Request for Fivetran: If a query like step 2 above times out, batch out the update statements to contain the impact on the warehouse and hopefully get the query to run without timing out.
-
Hi Daniel,
Thank you for submitting this request. This is a valuable idea for handling schema changes in large Snowflake tables more efficiently. I have added your suggestion to our feature improvements backlog.
To help us better understand the underlying challenge, could you share more details about the table or workloads where this is most valuable? For instance, are there particular types of schema changes (data type conversions, column additions/removals, etc.) that are most problematic, or specific timeout/error scenarios you've encountered? Any additional context on your update batching strategy would also be helpful.
We will keep the community updated on this thread with any progress.
Thanks,
Egidio
Please sign in to leave a comment.
Comments
1 comment