Other: HVR Snapshots
We would like to get the most recent table data using CDC for capture but only integrate latest record. i.e., if table A has record with PK = 1 updated 100 times from 9am to 5pm, during integration only get the latest (the last one) and ignore the previous 99.
-
Hi Nathaniel,
The capability you ask for is referred to coalescing in HVR. When HVR performs Burst integration (e.g. into Snowflake, Databricks, BigQuery etc.) you will see in the log that coalescing took place. This coalescing condenses a 100 changes in a cycle down to a single net change.
Note that updates may change different columns, and our software is designed to merge all changes so you get the accurate current row value.
So in your case you would run your capture process continuously, with integrate scheduled to run at or after 5 PM. Set CycleByteLimit to 0 so integrate will process all changes in one cycle (note depending on change volume and table definitions this may take up a lot of memory and/or spill to disk so take a while).
Is this what you are looking for?
Thank you,
Mark. -
Hello Mark,
Thank you for your comment however, we did try this and also engaged Fivetran support and was unable to have this behave the way we need. The reason is the times of the changes... if we have 100 changes over a day (say 8am to 6:30am the next morning), when the integration job runs, it pulls in all of the 100 changes, even with coalesce and cycle byte limit set to 0. If all 100 changes are on the same field (i.e. update date), we only want the latest... not all 100... maybe that wasn't clear when I first posted this... apologies... please provide any other info you may have (including teleport sync which I hear during the Fivetran tech event in Boston a couple of weeks ago)... thanks!
Please sign in to leave a comment.
Comments
2 comments