Question
Your PostgreSQL connector is charging more Monthly Active Rows (MAR) than the number of rows updated in the source database. This is most easily seen with insert-only tables where no update or delete operations are ever executed.
Environment
- Fivetran PostgreSQL connectors using XMIN replication type
- Postgres source database is version 9.4 or later
Answer
Follow this guide to enable PostgreSQL plugins and a wrapper functions. Fivetran can then use a new set of extraction queries that will filter out frozen tuples.
Cause
XMIN and Transaction IDs (XID)
The Fivetran PostgreSQL XMIN connectors track changes by the the hidden xmin
system column that is present in all PostgreSQL tables. This is a 32 bit value that represents the XID of the inserting transaction for this row version. It increases sequentially with every update to a row (aka tuple) such as inserts, updates, and deletes. A 32 bit number has a minimum value of 0 and a maximum value of 4,294,967,295.
Imagine a PostgreSQL database running a shopping cart. Every time you go on their site and add something to a cart, the database updates a table with a transaction and that transaction is assigned an ID value which is also the XMIN value. The existing XMIN value then increases by 1.
If this database has been running for 10 years and this is a busy production database, then it would not take long for the total transactions to reach the maximum of 4,294,967,295. What happens when the database runs out of IDs to assign to new transactions? Do you just get a new database?
XMIN Wraparound
As databases grew in the number of transactions they had to handle during the lifetime of an installation, PostgreSQL devs saw the need to have a higher maximum number of transactions to support a database with a long operational life. They added an additional 32 bit field which would increase by 1 every time the database reaches 4.2 billion transactions and sets the transaction ID (XID) counter back to zero. This is known as "wraparound".
Vacuuming
Postgres internally deals with the wraparound by freezing old transactions during the VACUUM
Process. During the VACUUM
process, Postgres marks the old transactions as frozen in time. In Postgres Version 9.4 and before, this was done by changing the XMIN value to FrozenTransactionId(2)
. After Postgres Version 9.4, XMIN values were not being changed directly anymore. Postgres was accomplishing this by changing an internal field in the tuple header called infomask
. This basically means that multiple rows can have the same XMIN values. This article explains XID wraparound failures in more detail.
What were the consequences for Multiple Rows with the same XMIN?
If you are using Postgres Version before 9.4, there are no MAR counting issues because frozen tuples are set to FrozenTransactionId(2)
If you are using Postgres Version 9.4 and above, Fivetran’s incremental query might get old records and new records with the same XMIN range. This is because our original extract queries have no way to filter out tuples frozen by the VACUUM process.
For example, let’s say we query for records where age(xmin) > 0 and age(xmin) < 1000. Since there could be multiple wraparound generations of data (a wraparound happens every 4.2 billion transactions) that can have xmin between 0 and 1000, we will end up extracting all rows; frozen and unfrozen.
This is the cause of MAR being higher than expected. Fivetran bills by counting the number of unique records received from the extract phase. It will count even the old transactions extracted from the same XMIN range.