PostgreSQL Connector Overview
- The first initial sync or “historical” sync will replicate the entire PostgreSQL data into your destination warehouse. Depending on the size of your database, this can take anywhere from days to weeks. This historical sync does NOT contribute toward MAR.
- If additional tables need to be introduced later, this will not incur MAR.
- Decide how to set up your connector(s). You could set up one connector for static tables and one connector for dynamic tables to track MAR. Creating separate connectors lets you track and isolate changes coming to the database and keep them independent of each other. Changes can be grouped by function, relationship, etc. and can be a mix of small and large tables.
What Contributes Towards MAR?
There are 4 primary actions that will contribute to MAR:
- A new row is added to an existing table.
- A value is updated for an existing row in a table.
- A value is deleted within a table.
- A new column which has values is added to a table.
Managing MAR for this Connector
- If a table was imported initially and then unchecked and was left for a while, not being refreshed and then re-imported later on, every primary key row will count towards MAR
- If separate connectors sync the same data from the same source (with the same primary keys), they contribute separately towards your MAR. This means if you have two or more of the same connector type that sync from one source to multiple destinations, each connector’s active monthly rows are counted separately.
- If a table does not have a primary key, we create a synthetic (hash) primary key. The composition of this primary key differs by source. To create a synthetic primary key. MAR for these tables is based on their synthetic primary keys.
- Post load Transformations do not count towards monthly active rows
- Replication frequency makes no difference to your monthly active rows because MAR is based on how many unique rows are updated.
- If a row that has a primary key gets deleted and then after a while another row gets created with the same primary key as the deleted one, is still one MAR
- In a table without a primary key, removing or adding columns from which we generate the synthetic primary key does affect MAR.
- Make sure the Destination is always connected, if Fivetran cannot unload data into the destination, it will only hold this data up to 24 hours before discarding it. If it goes beyond 24 hours, depending how long the WAL is set up for, but assuming its less than 24 hours, it will need to do a historical re-sync which means incurring MAR. If the WAL is long enough, we may not need to do a re-sync as we can read through the Logs themselves.
- Regular vacuuming of Postgres needs to be done so the XMIN is always in an unfrozen state, otherwise a full historical sync needs to be carried out which will incur MAR.
- Best Practice is to modify only new records in Postgres. Modifying very old records every sync causes a high percentage of these tables to incur MAR.
- Blocking schemas or tables from syncing if they do not contain valuable information.
- Column blocking to reduce MAR is only applicable to tables that are without Primary keys in Postgres. This is because we create a synthetic primary key based on a commonly changing column + other unique keys. If the commonly changing column is blocked then this will reduce MAR
- Using XMIN instead of WAL in PostgreSQL will have little effect on MAR consumption. XMIN does not capture deletes, so within the CUD operations Delete operations are not captured.
- It is recommended not to pause the connector for too long because then Fivetran will need to go to historical resync rather than a log read because the logs have run out.