Tag: HVR version 5.0.4 and higher
How to create multiple integrate jobs in the same channel for parallelism using /Absent parameter
When the need is there to have parallel integrate jobs in the same channel for example to divide the load among multiple integrate jobs, the following example will help in doing so.
Below example is a channel which captures from an Oracle database and integrates into Amazon S3 buckets, Oracle and SQL Server. The channel consists of 1 capture job for all tables in an Oracle schema, and 3 parallel integrate jobs sending the data to the Oracle, SQL Server, S3 bucket.
In the example below there is 1 integrate job (job multiintegra-integ-olx) which integrates all tables to Oracle Target, the second job (multiintegra-integ-snw2) integrates all the tables except the ones marked as Absent in the channel and same for the third integrate job (multiintegrate-integ-s3)
Additional for integrating into Amazon S3 buckets where no small files should be created because of performance impact, the integrate jobs which have multiple tables, have the action /Integrate /OrderByTable which ensures data is sorted in such way only a single target file per source table will be created in an integrate cycle. If data would not be sorted, a lot of small target files will be created per source table. To minimize the amount of integrate cycles, CycleByteLimit is set to 0, which means process all transaction files created by capture in 1 cycle, instead of (default) 10 MB chunks.
The use of /TableProperties /Absent is supported in HVR version 5.0.4 and higher.
The channel for above setup looks like below