Connector Improvement: Update the Fivetran sync column only for records that have actually changed
Not plannedIt has been observed that for file-based connectors, even when there is no change in a record on the source side, Fivetran still syncs that record and updates the Fivetran sync column with the latest sync timestamp. This creates a challenge for developers who need to identify which records actually changed.
For example, if an SFTP file contains 10 million records and only 1 record is updated, Fivetran still updates the Fivetran sync timestamp for all 10 million records after the sync. As a result, developers cannot determine which records were truly modified, making it impossible to implement a proper incremental load. Ideally, the Fivetran sync value should be updated only for the single changed record, not for all 10 million.
Based on our conversation with Fivetran Support, Fivetran maintains a hash for each row, so the capability to detect changes already exists in the backend. It would be helpful if the Fivetran sync value were updated only for updated or inserted records at the destination level, to support true incremental change detection.
-
Official comment
Hi Atul,
Thank you for submitting this request. I understand the goal: you want the Fivetran sync column to update only for records that truly changed.
Given file sources don't include a record-level modified timestamp, Fivetran's file connectors cannot offer this specific behavior without performance degradation. The hash you are referring to is sampling-based and not something we can use for this purpose.
For true incremental behavior with file data, the most reliable options are:
-
Exporting incremental slices from the upstream system, if supported, or
-
Implementing destination-side post-processing that isolates the actual changes needed for downstream pipelines.
Thanks,
Parmeet -
-
Hi Parmeet,
Thank you for your response.
We are requesting this feature for the following reasons:
-
Fivetran already checks for changes at the primary key level, and if no change is detected, no update is performed, and it does not count toward MAR consumption.
-
Since Fivetran can already determine when a record has not changed, we’re wondering why the FIVETRAN_SYNCED column still needs to be updated in those cases. If this column is required to reflect the latest run, could we instead introduce another column—such as FIVETRAN_RECORD_MODIFIED—that is only updated when an actual change occurs in the record?
Without this capability, we would be forced to compare millions of incoming records (we receive a full snapshot on each incremental run) with a target table containing tens of millions of rows to calculate deltas in Snowflake. This would significantly increase our Snowflake compute costs.
Please let me know if you have any questions.
Thanks,
Manjeeth -
Please sign in to leave a comment.
Comments
2 comments