Problem Summary: How to monitor hvr jobs using SNMP integration
HVR comes with a utility HVRMaint that can be used for two purposes: Maintenance and Monitoring.
For monitoring, it can be configured to e.g. scan the HVR log files for errors, check the HVR Scheduler status, check replication latency thresholds, etc. When such an error happens an alert can be sent. This can either be done using email or using SNMP notification. The latter is used to support integration with enterprise monitoring systems.
In order to enable SNMP integration, HVR Maint needs to be set up with the following options:
# -snmp_notify Send SNMP v1 traps or v2c notifications. # The -snmp_community option is required. # See \$HVR_HOME/lib/mibs/HVR-MIB.txt # # -snmp_version Specify '1' or '2c' (default) # # -snmp_heartbeat Send a hvrMaintNotifySummary notification, even if # there was nothing to report. # # -snmp_hostname SNMP agent hostname. Defaults to localhost. # # -snmp_port SNMP agent trap port. Defaults to port 162. # # -snmp_community Community string for SNMPv1/v2c transactions.
In more detail here is a list of the SNMP notifications that we send:
hvrMaintNotifySummary; number: 1 hvrMaintNotifyError; number: 2 hvrMaintNotifyLatency; number: 3 hvrMaintNotifyJobError; number: 4
The subids of hvrMaintNotif are notifications send by HVR. With SNMP v1 they will be sent with enterprise-id "hvrMaint".
Descriptive information for each of the notifications:
A summary/hearbeat which is sent every time the HVR Maint runs even if there are no errors: hvrMaintNotifySummary, with: #errors, #jobs over latency limit, #job errors found in the log
For errors that are encountered by the HVR Maint itself e.g. can't open hvr.out log file, HVR Scheduler not running, it can't find HVR Maint option file, etc. hvrMaintNotifyError, with the error message
For each replication job that is over the latency limit: hvrMaintNotifyJobLatency, with job name and latency in seconds
For each (distinct) error in the log file hvrMaintNotifyJobError, error message from hvr replication, with F_Jxxxxx error code, #occurrences, first occurrence, last occurrence, the text of last occurs