Question
How can row-wise compare say a table is identical, but bulk/checksum say it's different?
Environment
Local Data Processing
Description:
Conceptually, the checksum/bulk looks at the raw bytes as HVR transports them over the network; it just checks if they are the same. The row-wise unpacks these bytes into the correct datatype and then compares them; it checks which value is bigger (not just if they are the same).
There could be a difference between row-wise and checksum/bulk compare, but this would be an HVR bug. The problem would be that checksum/bulk compare says a table is different, whereas row-wise compare says it is the same. The opposite problem (checksum/bulk compare says same, whereas row-wise compare says different) has not been seen before and would be weird.
Answer:
There are various reasons why checksum/bulk compare could say a table is different, but row-wise compare says it's the same.
- 'Noise'. Unused bytes in HVR's transport format could cause a different checksum, even though the values are identical.
- Float datatypes are sometimes 'lossy'. Is 0 the same as 1E-200? Row-wise uses matching algorithm with a tolerance for float rounding inaccuracies.
- Other coercion errors.
Such problems can be troubleshooted by following below steps:
When such a problem is detected it is important to get a specific test-case back to HVR technical support.
- First, make a channel with only the specific table.
- 'Chop' it down to the key columns and a column with the false difference. Columns can be chopped either by removing them in the GUI or by adding ColumnProperties/IgnoreDuringCompare.
- Keep doing that until the number of columns has been minimized.
- Finally chop down the number of rows. This can be done be defining action { group=* table=tab1 Restrict /RefreshCondition="{k} <= {hvr_var_k_min} and {hvr} >= {hvr_var_k_max}" /Context=chop }
- Use HVR Compare with the 'chop' context enabled (in the "Contexts" tab) and experiment with min & max values until the diff is localized.
- Send a dump of the bad rows and the channel definition to HVR Technical Support.