Community

Other: General improvements for records syncs in Snowflake destinations

Answered

Joao Lisboa User

November 26, 2024 05:22

Hi team!

We noticed that. by looking at the `merge` operations that Fivetran performs under the hoods to update tables and records in the destination, it seems that there are some opportunity to improve performance and reduce costs (on the Customer side) for those operations:

- The value of `_fivetran_id` is a randomly generated string. If it could be replaced by a incremental numeric value (per table), it would be easier to Snowflake to find the values without needing to fully scan the table

- Along with the `_fivetran_id`, if the table had an additional `date` (not `datetime`) field derived from `event_time` (example: `event_time::date as event_date`) that can support the merge's `on` condition to make it even faster.

Note that the recommendations above are only applicable to Snowflake and would likely result in better performance in general, and reflect in lower credit consume from the destination, increasing the customers satisfaction overall :)

I'm happy to provide more context to the engineering team if that helps!

Regards,

Joao (Canva)

Please sign in to leave a comment.

Comments

1 comment

Official comment

Eric O'Connor Fivetranner
- May 14, 2025 21:52
Hi Joao,

The `_fivetran_id` is not a random number, but is rather a surrogate primary key that we create by hashing the contents of each row (documented here). This is only done in cases where there is no defined primary key available from the source system.

Since this is a hash, it is not possible to make the output sequential.

It's also not possible for us to MERGE ON a non-primary key value. If `event_time` were a primary key, I believe Snowflake would be able to apply partition pruning even without a truncated date type.

Please reach out directly to support if you're facing performance issues!

Thanks,

Eric O'Connor

Official comment

Eric O'Connor Fivetranner

May 14, 2025 21:52

Hi Joao,

The `_fivetran_id` is not a random number, but is rather a surrogate primary key that we create by hashing the contents of each row (documented here). This is only done in cases where there is no defined primary key available from the source system.

Since this is a hash, it is not possible to make the output sequential.

It's also not possible for us to MERGE ON a non-primary key value. If `event_time` were a primary key, I believe Snowflake would be able to apply partition pruning even without a truncated date type.

Please reach out directly to support if you're facing performance issues!

Thanks,

Eric O'Connor