SQL Server Connector Overview
- The first initial sync or “historical” sync will replicate the entire SQL Server data into your destination warehouse for all tables the “Fivetran” user was given SELECT permissions to except those excluded from the sync within the Fivetran dashboard
- There are a variety of ways you may consider to configure your connector(s); creating one single connector is easier to manage, whereas creating separate connectors lets you track and isolate changes coming to the database and keep them independent of each other:
- You could set up one connector for static tables and one connector for dynamic tables to track MAR
- Connectors can be grouped by any arbitrary group - for example, business unit, function or relationship - and can be a mix of small and large tables
- An important configuration option to consider is the incremental update mechanism you choose to proceed with:
- Both Change Data Capture (CDC) and Change Tracking (CT) are available on all platforms where your database is hosted
- There are pros and cons to the use of CDC versus CT ; what is important to note within this document is that CDC incurs greater MAR than CT
- An aside to this is that you can specify either CDC or CT at a table level and so can leverage both options within the same database; if both CDC and CT are enabled, Fivetran will default to CT for that table
What Contributes Towards MAR?
There are 5 primary actions that will contribute to MAR:
- A new row is added to an existing table.
- A value is updated for an existing row in a table.
- A value is deleted within a table.
- A new column that has values is added to a table.
- A manual or automatic resync is triggered.
Managing MAR for this Connector
- If separate connectors sync the same data from the same source (with the same primary keys), they contribute separately towards your MAR; this means if you have two or more of the same connector type that sync from one source to multiple destinations, each connector’s MAR will be counted separately
- Adding an empty column does not contribute to MAR
- Tables without a primary key can only be synced if CDC is enabled
- For these tables, Fivetran creates a synthetic primary key (with the column “fivetran_id”) that is a composite hash of all non-Fivetran column values within the row; MAR is then incurred for updates based on that “fivetran_id”
- The scenario detailed in Fivetran’s documentation - an update in the source table (without the primary key) being treated as both a DELETE and an INSERT - will incur MAR for each action (as two rows are being created in the Destination)
- If a row that has a primary key gets deleted and then after a while another row gets created with the same primary key as the deleted one, this will only count towards one MAR
- In a table without a primary key, removing or adding columns from which we generate the synthetic primary key does incur MAR for every change that is made
- Should you wish to migrate tables (from your current setup to a differing setup; this could be, for example, migrating from one large table to three smaller tables) then you will only incur MAR if you modify the existing incumbent table
- Make sure the Destination is always connected; if Fivetran cannot unload data into the Destination, it will only hold this data for up to 24 hours before discarding it
- If this disconnection goes beyond 24 hours - depending on how long your incremental update mechanism is set up for, but assuming it’s less than 24 hours - Fivetran will need to do a historical re-sync and incur MAR
- If the incremental update mechanism is long enough, we may not need to do a re-sync as we can read through the change logs themselves
- The configuration of the CDC and/or CT log retention period is therefore important to ensure that this period is longer than the synchronization schedule you set up against the connector in Fivetran; CDC and its associated stored procedures incur both more storage and more compute and so may require a more frequent purge of the change tables, so if choosing CDC to ensure you are even more mindful of that synchronization schedule
- The pausing of a connector should also take the change log retention period into account; do not pause the connector for longer than those logs are retained
- Modify only new records; modifying very old records every sync causes a high percentage of these tables to incur MAR, where the updated records may not possess any significant business value.