Other: HVR Compare/Repair Enhancement

The problem statement. When we run a compare, it takes about 12 minutes to churn thru 3.8 million rows. Currently this is a small test of 12 tables to test feasibility. The compare generates the diff file and we know where data is different. Then using row-wise refresh looks like it will run for approx 9 days for 3 of the 12 tables in the channel.

The documentation suggests that the compare generates the needed inserts/updates/deletes needed to bring the data back in-sync (image below along with link). Why wouldn't the row-wise refresh take advantage of this OR if it does, why does it not use the same mechanism by which the integrate operates?

Comparing this to our current solution: (admittedly from Oracle to Oracle or SQL Server to Oracle) - Veridata. This determines the diffs, generates the insert/update/deletes and issues them when we do a "repair".

If we are to use these data for reporting we need audit compliance (i.e. SOX). Nine days for a very small subset of our data simply isn't going to work. This also increases our latency for any changes in the router. I'm really hoping I just have a fundamental misunderstanding with regards to how compare/refresh is supposed to operate.

Additionally, I would also expect to see some way to initiate a "repair" refresh after a compare job finishes, leveraging only the diffs captured by the compare. The way the application seems to work:

Run compare and see the output with no functionality beyond "oh, hey, my data is out-of-sync".
Run a row-wise refresh that runs another compare, determines the diffs and generates the insert/update/delete statements on the fly.

Image and Link to doc:

https://fivetran.com/docs/hvr6/getting-started/concepts/compare#comparetypes

Community

Other: HVR Compare/Repair Enhancement

Comments