Introduction
The concept of database auditing is to keep track of changes made to data, when the change was made, and (optionally) who made the change. Some organizations face regulatory requirements to keep an audit trail of data. Change Data Capture (CDC), the foundation of HVR's data replication capabilities, is an excellent starting point to keep an audit trail of data changes, either in the same database or in a separate system. HVR users can now maintain an audit trail by using HVR's hvr_op and a predefined feature TimeKey. This article primarily focuses on keeping audit trail for manipulated data. But if you are interested in keeping just the deleted data please refer to the page How to use Soft Delete function to keep track of deleted rows?
A Slowly Changing Dimension (SCD) is a data warehouse concept very similar to an audit table. HVR's auditing capabilities can also be used to implement different types of SCDs. For an explanation of the different types of SCDs please refer to Wikipedia: https://en.wikipedia.org/wiki/Slowly_changing_dimension.
This expert note explains the changes to be made to the channel in order to implement data SCDs/auditing. The changes are generic across the different databases. Generic HVR concepts are not included in this expert note. E.g. HVR Refresh is used to initially populate the SCD/audit table (if needed), and HVR Load is used to implement the run time environment for the channel.
The example below describes steps to implement SCD type 6 in Oracle source to Oracle target.
Configuration
Below are the actions that need to be defined on the source and target to implement SCD type 6 for the environment. Some of the extra columns are optional for a regular auditing table.
Actions on the source
Action |
Parameter |
Description |
DbCapture |
/LogBased |
The /LogBased option is recommended by HVR because this improves the efficiency of the replication by directly capturing data from the DBMS logging system and making sure all changes are captured. |
Actions on Target
Other than DbIntegrate the other actions defined on Target are listed below.
Action |
Parameter |
Description |
ColumnProperties |
/Name=capturetime /Extra /IntegrateExpression={hvr_cap_tstamp} /Datatype=timestamp /Timekey |
This action will add extra column called capturetime with datatype timestamp to the target tables which will view the system time when the data was captured by HVR. The option /Timekey will convert all the changes to insert in the target tables. For example, even if a row is deleted in the source it will be inserted in the target as a deleted row. The /Timekey option must be used for at least one of the columns on the target to maintain the audit trail. |
ColumnProperties |
/Name=change /Extra /IntegrateExpression{hvr_op} /Datatype=integer |
This action will add an extra column called change with a datatype integer to the target tables which will indicate the type of change made to the data. HVR has predefined values for the changes (delete=0 , insert=1 , update=2 ). |
ColumnProperties |
/Name=seq_no /Extra /IntegrateExpression="(select NVL(max(seq_no),0)+1 from [table] where [primary key]={[primary key]})" /Datatype=integer |
This action will add an extra column called seq_no with datatype integer to the target tables which will get incremented every time a change is made to a particular row with same Primary Key based on the source. For performance reasons consider creating a non-unique index on the primary key on the SCD/auditing table in the target database. |
The image below shows what the actions should look like.
Comparability
The HVR Compare feature can be performed to see if the source and target are in sync by adding below actions to the target database. For example, you want to compare supplier table in source and target, the action that you need to add is Restrict action on the target group with a /CompareCondition as below.
{seq_no} = (select max(seq_no) from [table] where [primary key] = {[primary key]}) and {change} not in (0,4))
This expression excludes deletes and before row changes ({change} not in (0,4) ) and use the most recent row image. /CompareCondition should look like the image below.
If you want this condition to only apply in some and not in all cases then consider setting the context to any name to identify when you want to perform the customized /CompareCondition. In our example it is 'comparescd'. The context is used by selecting the context under Contexts tab in the HVR Compare dialog.
Integrate Option /Burst
HVR Integrate provides an option /Burst to optimize incremental integration into columnar and/or massively parallel databases. Use of the option will automatically coalesce changes within a single transaction (note that /Coalesce is a separate option). In the regular (non-/Burst , non-/Coalesce ) mode the changes are applied in the order they were performed in the integrate cycle and all intermediate stages are inserted into the target tables with the corresponding value in the change column (1- insert, 2-update, 0-delete). For example, if an update is performed on a row and a delete is performed on the same row within the same integration cycle, the regular mode will show 3 rows. The change column will show before update operation (change=4), update operation (change=2) and delete operation (change=0). The sequence number shows the order in which the changes occurred. Below is a screenshot for the source side.
Below is the screenshot for the changes on target side will in regular (non-/Burst , non-/Coalesce ) mode.
In /Burst mode the changes are applied to the target in bulk and implicitly coalesced within an integration cycle. For example, if an update and delete are performed on the same row in one integrate cycle, only 1 new row is going to be added with the latest change.
This screenshot below shows the source side.
This screenshot below shows the target side when running in /Burst mode.
Note: /Burst mode for just one table in a single channel automatically gets applied to all the tables in a channel. So if only a subset of tables are desired to be in burst mode, creating a separate channel is recommended.
SCD Type 6 Output
Slowly Changing Dimension type 6 (https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_6_.2F_hybrid) is a hybrid between SCD 2 and 3. The changes captured by HVR in this expert node provide sufficient information to retrieve SCD type 6 data. This example adds an extra column current_flag to the target table indicating the current record and the old record, and also keeps the validity date of the old data in case of auditing purpose by the DBAs. The query below, run against the target table, provides validity dates for data rows:
select supplier_code, supplier_name, capturetime as valid_from, lead(capturetime,1,null) over (partition by supplier_code order by seq_no) as valid_to from supplier where change <> 4 ;
To implement SCD type 6 consider implementing a view similar to the following:
create or replace view view_supplier as select supplier_code , supplier_name , capturetime as valid_from , lead(capturetime,1,null) over (partition by supplier_code order by seq_no) as valid_to , nvl2(lead(capturetime,1,null) over (partition by supplier_code order by seq_no),'N','Y') as current_flag from supplier where change <> 4 ;
Running below query on the view shows:
select supplier_code, valid_from, valid_to, current_flag from view_supplier ;
- Posts