Destination Improvement: During resync, _fivetran_synced timestamp should update for records that are deleted.
The Problem
Currently, there is a discrepancy in how Fivetran handles system columns depending on the sync type:
-
Incremental Crawls: When a record is deleted,
_fivetran_deletedis set totrueand_fivetran_syncedis updated to the current sync time. -
Resyncs: When a record is detected as deleted,
_fivetran_deletedis set totrue, but_fivetran_syncedretains its old timestamp.
The Impact
This inconsistency breaks downstream incremental models (e.g., dbt). Many data teams use _fivetran_synced as the "watermark" to pull changed records. Because the timestamp doesn't update during a resync, these soft-deletes are missed by downstream processes, leading to data state mismatches between the source and the warehouse.
-
Official comment
Hi Shanmuga,
This behavior is documented here. The reason that our system has this behavior is because during a resync we initially mark all previous rows as deleted, and then load new rows into the table as they exist in the source as of the time the resync is initiated. If a deletion happens during this window, we have no way of detecting it, since you can think of a resync as logically starting from scratch after that deletion occurred.
incremental → delete(key=7) → incremental
↑
[ key=7 deleted ]
incremental → delete(key=7) → resync
↑
[ key=7 not present ]The scenario you describe would primarily affect rows that were deleted between the time the connector last ran, and the time the resync was initiated. One way you could mitigate this issue is to materialize a table of active rows (say, by primary key), and use that table to detect deletions when you resync a table.
Please sign in to leave a comment.
Comments
1 comment