Destination Improvement: Improve HVR Refresh with Slicing on very large tables
AnsweredWe have to use a boundary slice on a very large table.
The ID field is a hexadecimal id. We have to define a list of ID's to slice on, we are limited to 199 slices (200 total slices) If we are doing a targeted refresh, (not all records, where we are refreshing slices 149-190 as an example) and ignoring deleted records (soft delete), during the delete step of the refresh, HVR uses delete statement that tries to delete all records from the ID in slice 149 through the ID in slice 191. In a table this size, this takes days to complete and we have many times ran into resource issues due to the number of records being deleted at one time. To resolve this issue, we have to manually delete the records using SQL commands but do it in batches instead of all at one time. This performs exponentially better than on bulk delete. (i.e. hours vs days). Then when we run the HVR refresh, HVR still goes through the delete process but since the records are already deleted, it moves onto the next step reasonably quickly. Current design of HVR during a slice refresh is to delete all records in one transaction and then insert them in batches (slices) It is my recommendation that HVR be designed to both delete and insert records based on the slice conditions. Instead of one transaction to delete, it should run a delete for each slice as individual transactions.*
-
Hai,
Delete in multiple chunks would break our transactional consistency and leaves partial data in the target table if the refresh fails in the middle of the delete process. Arguably that is not a big deal, given the customer wants to refresh the data anyway. However, this is not how HVR works at this time.
We are working on a project to improve refresh performance (with or without slicing). This project (RD-494661) will parallelize writers in the HVR core to speed up the write side of the query. This parallelization works with or without slicing. Depending on bottlenecks this feature may allow a refresh that is sliced today to no longer be sliced thanks to this feature. If so then you could leverage truncate followed by load.
RD-494661 is planned to be released as 6.2.5/9 in August 2025.
Please let me know whether you think this feature may address the concern or not.
Thanks,
Mark. -
Mark, unfortunately that new feature doesn't help in their case.
1) The customer is using "soft delete" so even without slicing, they still need a delete operation with a WHERE clause so it does not remove the rows in the target that are soft-deleted.
2) With the very large tables they are selecting specific slices based on a determined “rollback” date so they may only refresh between a certain date forward. In this scenario, they would only be deleting specific boundary ranges as well as hvr_is_deleted=0. On very large tables, even with specific slices selected, this may still require a gigantic bulk delete.It seems like we are going to encounter large tables such as this across more of our customer base as their data grows, so a feature to support it could benefit more than just this one customer.
-
Can we set up a conversation with the customer please?
Thank you,
Mark.
Please sign in to leave a comment.
Comments
3 comments