- When does HVR use temp files?
- Where do temp files go?
- How big are temp files?
HVR tries to 'stream' data, but there are some moments when it has to create temp files.
HVR"s temp files are created in $HVR_TMP, but if that is not defined then they are located in $HVR_CONFIG/tmp/
HVR uses temp files at various moments. The following is an \[incomplete\] list,
During log-based capture, if a transaction exceeds a limit (all limits are detailed below) then the capture job will spill it to disk on the capture machine.
Row-wise compare and refresh sometimes need to sort data outside the DBMS. If the data in a table exceeds a limit then this data will be spilled into multiple sort-merge files on the target machine.
During row-wise refresh, if the number of changes detected exceeds a limit then these changes will be spilt to disk.
During row-wise verbose compare or refresh from the GUI, if the number of changes detected exceeds a limit then these changes will be spilt to disk.
If a single LOB value (e.g. long varchar or CLOB) exceeds a limit then its contents will be spilt to disk.
Action Transform/ with parameter /Command directs the output of a command transform to a temp file and then reads it back.
When bulk-loading data into Greenplum or Redshift, HVR puts the data into a 'staging' file first.
The size of these files depends on the data that HVR is processing (obviously).
Several of the above cases say that HVR spills to disk if it 'exceeds a limit'.
For cases 1, 3, 4 & 5 this limit is controlled by env-variable $HVR_TEMP_SPILL. This controls the number of bytes for the temporary 'blobs' in HVR's memory; these will spill to disk if their size exceeds (X-T)/N, where X= $HVR_TEMP_SPILL, T=total_size_of_currently_cached_temp_objects and N=num_open_cached_objects. The default of $HVR_TEMP_STORE is 64M (bytes). This means that, by default, a single BLOB will spill to disk if it's bigger than 32M.
For case 2 (sorting), the 'spill limit' is controlled by env-variable $HVR_SORT_BYTES_LIMIT (default 512Mb) and $HVR_SORT_ROW_LIMIT (10M). If data exceeds one of these amounts then HVR's sort will spill to disk. But it will never spill if the number of rows is below $HVR_SORT_ROW_MIN (default 128).
HVR's native \[uncompressed\] format was originally quite inflated because varchar(1000) containing "hello world" is held as 1000 bytes, not as 11 bytes. So, for older HVR versions, for cases 1, 2, and 3 above (log-based capture transactions, sorting, and row-wise changes), the temp file was sometimes much bigger than the original table data.
To combat this 'inflation', since 4.6.1 HVR compresses most of its temp files. In HVR 4.6.1, temp files for cases 1, 3, 4 & 5 are compressed unless $HVR_TMP_NO_COMPRESS=1 is set.. Since HVR 4.6.2, temp files for cases 1, 3, 4 & 5 are compressed unless $HVR_TMP_COMPRESS_LEVEL=0 is set. Before HVR 4.6.2 files for case 2 (sorting during row-wise) are only compressed if HVR_SORT_COMPRESS=1 is set. Since 4.6.2 files for case 2 are compressed unless $HVR_SORT_COMPRESS_LEVEL=0 is set.