How To: Define a location to connect to a Greenplum database
Overview:
This article describes how to create a Greenplum database location and how to configure it in order for Local Data Processing GUI to be able to connect to a Greenplum database.
Prerequisites:
- The hvrremotelistener is running on the remote Greenplum machine
- The database that we are connecting to is 'testdb'
- Driver Manager Library is already present on the remote machine where Greenplum resides
- The odbcinst.ini file is already present on the remote machine where Greenplum resides
- Greenplum database should be running
- The DataDirect Manager Library must exist on the machine that Local Data Processing connects to Greenplum
- The odbcinst.ini file must be present on the Linux machine that Local Data Processing connects to Greenplum
- The gpfdist server needs to be used in order to perform bulk loading in Greenplum database
- For Greenplum it is necessary that you have a staging directory created on the target machine. In this case it is in the directory structure '/home/gpadmin/stage1'
- A channel number_demo is already created with source Oracle database named as 'orcl'
Steps:
Step 1: Create location under Location Configuration A] Right-click on Location Configuration
B] In the New location window specify the name of Location select 'Greenplum' under the class section. Here we name location as 'gplum'
C] Since our Greenplum is on the remote server check box for 'Connect to Local Data Processing on remote machine'. This action will unlock the fields in the Database Connection group box.
Node |
IP address of the machine where hvrremotelistener is running |
Port |
Port number on which hvrremotelistener is running.By default it is 4343 |
Login |
Username to connect to Greenplum database |
Password |
Password to connect to Greenplum database |
D] Under the section Database Connection fill in the below parameter. Below are the parameters.
Node |
IP address of the server where hvrremotelistener is running |
Port |
Greenplum Database clients connect with TCP to the Greenplum master instance at the client connection port, 5432. So by default we use that port. |
Database |
This is the name of the Greenplum database that we are connecting to. In our case it is 'testdb' |
User |
The username used to connect to the Greenplum database |
Password |
The password to connect to the Greenplum database |
Driver Manager Library |
This is required only if you are running Greenplum on linux/unix. It should point to the DataDirect library installation. |
ODBCINST |
This field is required only if you are running Greenplum on linux/unix. It should point to a properly configured odbcinst.ini file. |
Node IP address of the server where hvrremotelistener is running Port Greenplum Database clients connect with TCP to the Greenplum master instance at the client connection port, 5432. So by default we use that port. Database This is the name of the Greenplum database that we are connecting to. In our case it is 'testdb' User The username used to connect to the Greenplum database Password The password to connect to the Greenplum database Driver Manager Library This is required only if you are running Greenplum on linux/unix. It should point to the DataDirect library installation. ODBCINST This field is required only if you are running Greenplum on linux/unix. It should point to a properly configured odbcinst.ini file. E] Click on Test connection to see if it is working, if it is then click on OK to add this location to the list
Step 2: Configure gpfdist by running below command from the directory where your staging directory resides. In this example staging directory lies in path $home/gpadmin/stage1 so we will run the command from directory home/gpadmin (This step is required if Greenplum is a target machine) .Please refer https://www.hvr-software.com/wiki/Requirements_for_Greenplum for more information gpfdist -p 33333 -d stage1 -m 1048576
Step 3: While adding gplum as a TARGET to a channel below are the essential actions that need to be added to the channel
Action 1: As a best practice it is recommended to run Integrate in burst mode which is a default requirement for Greenplum
Action 2: LocationProperties on TARGET to identify the location for Staging directory and Staging Database
In this way our Greenplum database is ready to be used as a Target machine.