How much system resources will Local Data Processing consume?
Local Data Processing
There are 3 cases, HVR on the source, on the hub, and on the target machine.
Source for Log-Based Capture (Not the Hub)
Every channel will use up to 1 CPU core on the system. If for whatever reason HVR is running behind and there is no bottleneck accessing the transaction logs or using memory then HVR may be using up to a full CPU core per channel. On a running system with HVR reading the tail end of the log the CPU consumption per channel is generally a lot lower than 100% of a CPU core. Note that a lot of the HVR CPU utilization goes into compressing the transaction files. Compression can be disabled with an environment variable to lower CPU utilization but in turn, it will increase network utilization (between source HVR agent installation and hub and between the hub and any target HVR installations). Refresh and compare, that are of course not run on an ongoing basis, will add processes with as many processes as the number of tables refreshed/compared in parallel. In general, the HVR process uses relatively few resources but the associated database job to retrieve the data uses a lot (and if in the database the select is parallelized then implicitly the refresh or compare can easily use up to 100% of all the CPU on the source database).
Memory consumption is very modest with up to 64 MB per transaction per channel until HVR starts spilling to disk. Generally, 64 MB for a transaction is not reached and much less memory is used per transaction but this depends on the size of the transactions and what portion of it is against tables that are part of a channel. On a typical environment, you will see a lot less than 1 GB of memory per channel used by HVR. Note that the 64 MB threshold can be adjusted using an environment variable (upwards and downwards).
The HVR installation is about 100 MB in size and while running CDC it uses no additional disk space until the 64 MB threshold from b) is exceeded and HVR starts spilling transactions to disk. HVR will write compressed files but in rare cases, with large batch jobs modifying tables in the channel that only commit at the end HVR may be writing a fair amount of data to disk. I would start with at least 5 GB for HVR_CONFIG. Please note that HVR Compare may also spill to disk which would also go into this area. If one aggressively backs up the transaction logs so that they become unavailable to the source database then you may consider discussing hvrlogrelease to take copies of the transaction logs until HVR does not need them anymore. This can add a lot of storage space to the requirements depending on the log generation volume of the database and how long transactions may run (whether they are idle or active does not make a difference for this).
Every channel will perform frequent IOs to the transaction logs. If HVR is current then every one of these IOs is on the tail end of the log which – on old systems – could be a source of contention (especially if there are many channels). Modern systems have a file system or storage cache and frequent IOs should barely be noticeable.
Every HVR job will spawn a process – i.e. one for every capture, one for every integrate. CPU utilization for each of these processes on the hub is generally very low unless some heavy transformations are processed on the hub (i.e. depending on the channel design). In addition refresh or compare may spawn multiple processes when running, and when performing a row-by-row refresh/compare a lot of CPU can be used.
memory consumption is generally a little higher on the hub than on the source but still fairly modest. Some customers run dozens of channels on a dedicated hub with a fairly modest configuration. Row-by-row refresh and compare may use a lot of memory but are not run on an ongoing basis.
storage utilization on the hub can be high. If capture is running but integrate is not into at least one destination then HVR will accumulate transaction files on the hub. These files are compressed but depending on the activity on the source database and the amount of time it takes until the destination starts processing transactions a lot of storage space may be used. Start with at least 10 GB but possibly more if the hub manages multiple channels and network connectivity is unreliable. Large row-by-row refresh or compare can also use a lot of storage space.
if HVR is running CDC and keeping up with the transaction log generation on a busy system processing many small transactions then transaction files will be created at a rapid pace. Make sure that the file system can handle frequent IOs. Typically a storage system cache or file system cache or SSD (or a combination of these) can take care of this.
on the target, HVR typically does not use a lot of CPU resources, but the database session it initiates does (also depends a little on what if any transformations are run as part of the channel definition). A single Integrate process will have a single database process that can easily use a full CPU core. Multiple channels into the same target will each add one process (unless specifically configured to split into more than one Integrate process). Compare/refresh will be able to use more cores depending on the parallelism in HVR and associated database processes may use more than one core each depending on any parallelism settings at the database level.
memory consumption for HVR on the target is very modest unless large transactions have to be processed. Typically a lot less than 1 GB per integrate is used. Row-by-row refresh and compare can use GB's of memory but are not run on an ongoing basis.
HVR_CONFIG on the target may be storing temporary files for row-by-row compare or refresh, and if tables are large a significant amount of space may be required. Start with 5 GB.
IO performance for HVR on the target is generally not critical.
If a system combines the role of source and/or hub and/or target then please add resource consumption across these.