SQL Server is Microsoft’s SQL database. Fivetran’s integration service replicates data from your SQL Server source database and loads it into your destination at regular intervals.
Row-based relational databases, like SQL Server, are optimized for high-volume, high-frequency transactional applications. While very performant as production databases, they are not optimized for analytical querying. Your analytical queries will be very slow if you build your BI stack directly on top of your transactional SQL Server database, and you run the risk of slowing down your application layer.
Column-based databases are optimized for performing analytical queries on large volumes of data at speeds far exceeding those of SQL Server. While these databases are not good for high-frequency transactional applications, they are highly efficient in data storage. They permit more data compression (10x-100x) than row-based databases, which makes them a cost effective way to store and access data for analytical purposes.
Supported serviceslink
Fivetran supports three SQL Server database services:
- Generic SQL Server
- Amazon SQL Server RDS
- Azure SQL Database
- Azure SQL Managed Instance
- Google Cloud SQL for SQL Server PRIVATE PREVIEW
Supported configurationslink
Fivetran supports the following SQL Server configurations:
Supportability Category | Supported Values |
---|---|
Database versions | SQL Server 2012 - 2019 |
Maximum table count per connector | 15,000 |
Maximum column count per table | 1,019 |
Maximum throughput * | 5.0 MBps |
IMPORTANT: We do not support single-user mode.
* Maximum Throughput (MBps) is your connector’s end-to-end update speed. We calculate MBps by averaging the number of rows synced per second during your connector’s last 3-4 syncs. To learn more about sync speed, see the Replication speeds section.
Network protocol | Supported Versions | Notes |
---|---|---|
Transport Layer Security (TLS) | TLS 1.0, TLS 1.1, TLS 1.2 | We can only support TLS versions that your corresponding version of the database supports. |
Which SQL Server instance types we can connect to depend on whether you use change tracking or change data capture as your incremental update mechanism. Read our Updating data documentation for more information.
Instance Types | Change Tracking | Change Data Capture |
---|---|---|
Generic SQL Server | ||
Primary instance | check | check |
Availability group replica | check | |
Amazon SQL Server RDS | ||
Primary instance | check | check |
Availability group replica | check | |
Azure SQL Database | ||
Primary instance | check | check |
Availability group replica | check | |
Azure SQL Managed Instance | ||
Primary instance | check | check |
Availability group replica | check | |
Google Cloud SQL for SQL Server | ||
Primary instance | check | check |
Availability group replica | check |
Featureslink
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | All tables and fields |
Custom data | check | All tables and fields |
Data blocking | check | Column level, table level, and schema level |
Column hashing | check | |
Re-sync | check | Table level |
History | ||
API configurable | check | |
Priority-first sync | ||
Private networking | check | AWS PrivateLink: Generic SQL Server on EC2, SQL Server RDS Azure Private Link: Azure SQL Database, Azure SQL Managed Instance |
Fivetran data models |
Setup guidelink
Follow our step-by-step setup guides for specific instructions on how to set up your SQL Server database type:
- Generic SQL Server
- Amazon SQL Server RDS
- Azure SQL Database
- Azure SQL Managed Instance
- Google Cloud SQL for SQL Server PRIVATE PREVIEW
Sync overviewlink
Once Fivetran is connected to your database, we pull a full dump of all selected data from your database. We then use one of SQL Server’s two built-in tracking mechanisms, change tracking and change data capture, to pull all your new and changed data at regular intervals. If data in your database changes (for example, you add new tables or change a data type), Fivetran automatically detects and persists these changes into your destination.
Syncing empty tables and columnslink
Fivetran can sync empty tables for your SQL Server connector.
We can also sync empty columns in most cases. However, if you don’t add rows after you create a new column, we cannot sync that new column. We need at least one row to see a new column because we learn of changes to a table’s column cardinality when we see a row with a new or removed column during an update.
For more information, see our Features documentation.
Replication speedslink
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the data destination.
The ability to sync changes quickly also depends on the sync frequency you configure. The risk of the sync falling behind, or being unable to keep up with data changes, decreases as the sync frequency increases. We recommend a higher sync frequency for data sources with a high rate of data changes.
To measure the rate of new data in your database, check the disk space usage metrics over time for databases hosted on cloud providers. For self-hosted databases, you can run the following query to determine disk space usage:
SELECT SUM(bytes)/1024/1024 AS MB FROM dba_segments;
Schema informationlink
Fivetran tries to replicate the exact schema and tables from your database to your destination.
Fivetran-generated columnslink
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source table_fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformation and mappinglink
As we extract your data, we match SQL Server data types to data types that Fivetran supports. If we don’t support a certain data type, we automatically change that type to the closest supported type or, for some types, don’t load that data at all. Our system automatically skips columns with data types that we don’t accept or transform.
The following table illustrates how we transform your SQL Server data types into Fivetran supported types:
SQL Server Type | Fivetran Type | Fivetran Supported |
---|---|---|
BIGINT | BIGINT | True |
BINARY | BINARY | True |
BIT | BOOLEAN | True |
CHAR | STRING | True |
DATE | DATE | True |
DATETIME | TIMESTAMP_NTZ | True |
DATETIME2 | TIMESTAMP_NTZ | True |
DATETIMEOFFSET | TIMESTAMP | True |
DECIMAL | DECIMAL | True |
FLOAT | DOUBLE | True |
GEOMETRY | JSON | True |
GEOGRAPHY | JSON | True |
HIERARCHYID | NVARCHAR | True |
IMAGE | BINARY | True |
INTEGER | INTEGER | True |
MONEY | DECIMAL | True |
NCHAR | STRING | True |
NTEXT | STRING | True |
NUMERIC | DECIMAL | True |
NVARCHAR | STRING | True |
REAL | REAL | True |
ROWVERSION | BINARY | True |
SMALLDATETIME | TIMESTAMP_NTZ | True |
SMALLMONEY | DECIMAL | True |
SMALLINT | SMALLINT | True |
TEXT | STRING | True |
TIME | STRING | True |
TIMESTAMP | BINARY | True |
TINYINT | SMALLINT | True |
UNIQUEIDENTIFIER | STRING | True |
VARCHAR | STRING | True |
VARBINARY | BINARY | True |
XML | STRING | True |
We also support syncing user-defined data types. When you create a user-defined type in SQL Server, you are required to choose a base type. If Fivetran supports that base type, we automatically transform your user-defined type to its corresponding Fivetran type. For more information, see Microsoft’s user-defined types documentation.
If we are missing an important type that you need, please reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual data destination pages.
Excluding source datalink
If you don’t want to sync all the data from your database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Column Blocking documentation.
Alternatively, you can change the permissions of the Fivetran user you created and restrict its access to certain tables or columns.
How to allow only a subset of tables
In your primary database, you can grant SELECT permissions to the Fivetran user on all tables in a given schema:
GRANT SELECT on SCHEMA::<schema> to fivetran;
or only grant SELECT permissions for a specific table:
GRANT SELECT ON [<schema>].[<table>] TO fivetran;
How to allow only a subset of columns
You can restrict the column access of your database’s Fivetran user in two ways:
-
Grant SELECT permissions only on certain columns:
GRANT SELECT ON [<schema>].[<table>] ([<column 1>], [<column 2>], ...) TO fivetran;
-
Deny SELECT permissions only on certain columns:
GRANT SELECT ON [<schema>].[<table>] TO fivetran; DENY SELECT ON [<schema>].[<table>] ([<column X>], [<column Y>], ...) TO fivetran;
Initial Synclink
Once Fivetran is connected to your database or read replica, we first copy all rows from every table in every schema for which we have SELECT permission (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy rows by performing a SELECT statement on each table. For large tables, we copy a limited number of rows at a time so that we don’t have to start the sync over from the beginning if our connection is lost midway. How many rows we copy at a time depends on whether your tables have clustered indices or not. For tables with clustered indices, we copy 500,000 rows at a time. For tables without clustered indices, we copy 5,000,000 rows at a time.
Connecting to a read replicalink
You can only connect Fivetran to a read replica if Change-Data Capture is enabled on the primary database. We need this in order to perform our incremental updates.
Updating datalink
Once the initial sync is complete, Fivetran performs incremental updates of any new or modified data from your source database. During incremental updates, we request only the data that has changed since our last sync. We use one of SQL Server’s two built-in tracking mechanisms for incremental updates: change tracking (CT) and change data capture (CDC).
Both CT and CDC create change records that Fivetran accesses on a per-table basis during incremental updates. To guarantee data integrity, we check for changes on every table with CT or CDC enabled during each update, which can add to the sync time. Customers who sync with many thousands of tables can therefore expect longer syncs.
Tables that do not have CT or CDC enabled still appear on your Fivetran dashboard, but they are disabled. They also have a message indicating that you need to enable either CT or CDC.
If CT and CDC are both enabled on a table, we use CT as the incremental update mechanism.
CT vs CDClink
There are several key differences between change tracking (CT) and change data capture (CDC):
Feature | Change tracking | Change data capture |
---|---|---|
Syncs tables with primary keys | Yes | Yes |
Syncs tables without primary keys | No | Yes |
Can read from a replica | No | Yes |
Tracks DML changes | Yes | Yes |
Provides full record of changed data | No | Yes |
Tracks previous changes | No | Yes |
Note: CDC has heavier processing and storage overhead than CT.
To learn more about CDC and CT, read on below or see Microsoft’s Track Data Changes documentation.
Change trackinglink
Change tracking (CT) records when a row in a table has changed, but does not capture the data that was changed. CT also does not capture how many times the row changed or record any previous changes. SQL Server records changes from all tables that have CT enabled in a single internal change table.
You cannot sync tables without primary keys if you choose CT as your incremental update mechanism. CT needs primary keys to identify rows that have changed.
You must enable CT on the primary database, as well as on each individual table that you want to sync. When you enable CT on your primary database, you can select a window size. Window size determines how long your change records are kept in the change table before they are deleted. If not explicitly specified, the default value is 3 days. We recommend changing the window size to 7 days. A longer window gives us more time to resolve any potential sync issues before change records are deleted. Our system detects when we were unable to process changes to a table before they were deleted from the change table. When we detect this situation, we trigger a re-sync for that table.
Change tracking is a lightweight background process that should not impact your production workload. CT takes up minimal storage space on your hard drive because its change table only records the primary keys of changed rows.
Change data capturelink
NOTE: Fivetran can only sync tables that have a single CDC capture instance. Our syncs only include tables and columns that are present in a CDC instance. If you add new tables or columns, you must create a new CDC instance that includes them and delete the old instance.
Change data capture (CDC) tracks every change that is applied to a table and records those changes in a shadow history table. Unlike CT, CDC captures what data was changed and when, so you can see how many times a row has changed and view past changes.
CDC can track changes on any kind of table, with or without primary keys. Due to SQL Server limitations, a SQL Server connector with CDC enabled automatically propagates column type changes to your destination but does not propagate any other schema changes, such as renamed columns, new tables, and new columns.
You must enable CDC on the primary database, as well as on each individual table that you want to sync. When you enable CDC on your primary database, you can select a window size (also known as a retention period). Window size determines how long your change records are kept in the shadow history table before they are deleted. Unless specified, the default window size is 3 days. We recommend increasing the window size to 7 days. A longer window gives us more time to resolve any potential sync issues before change records are deleted.
CDC is a heavier process than CT. CDC takes up more storage space in your database because it captures entire changed records, not just the primary keys of changed rows. CDC also uses more compute resources than CT because it writes each table’s changes to its own shadow history table. If you want to reduce some of the load on your production database, you can configure CDC to read from a replica, though you will need to enable it on your primary database first.
Tables with a primary keylink
We merge changes to tables with primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
- A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
- An UPDATE in the source table updates the corresponding row in the destination
Tables without a primary keylink
Note: Fivetran cannot sync tables without a primary key using CT. You must have CDC enabled to sync tables without a primary key.
We handle changes to tables without a primary key differently:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
. - The
_fivetran_id
column helps us handle DELETE operations:- If there is a row in the destination that has a corresponding
_fivetran_id
value, that row will be updated with_fivetran_deleted = TRUE
. - If there is not a row in the destination that has a corresponding
_fivetran_id
value, a new row will be added with_fivetran_deleted = TRUE
.
- If there is a row in the destination that has a corresponding
- An UPDATE in the source table is treated as a DELETE followed by an INSERT, so it results in two rows in the destination:
- A row containing the old values with
_fivetran_deleted = TRUE
- A row containing the new values with
_fivetran_deleted = FALSE
- A row containing the old values with
As a result, one record in your source database may have several corresponding rows in your destination. For example, suppose you have a products
table in your source database with no primary key:
description | quantity |
---|---|
Shrink-ray gun | 1 |
Boogie robot | 2 |
Cookie robot | 3 |
You load this table into your destination during your initial sync, creating this destination table:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | ‘2000-01-01 00:00:00’ | 0 | FALSE | asdf |
Cookie robot | 2 | ‘2000-01-01 00:00:00’ | 1 | FALSE | dfdf |
Boogie robot | 3 | ‘2000-01-01 00:00:00’ | 2 | FALSE | ewra |
You then update a row:
UPDATE products SET quantity = 4 WHERE description = 'Cookie robot';
After your UPDATE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | ‘2000-01-01 00:00:00’ | 0 | FALSE | asdf |
Cookie robot | 2 | ‘2000-01-01 00:00:00’ | 3 | TRUE | dfdf |
Boogie robot | 3 | ‘2000-01-01 00:00:00’ | 2 | FALSE | ewra |
Cookie robot | 4 | ‘2000-01-01 00:00:00’ | 4 | FALSE | zxfd |
You then delete a row:
DELETE FROM products WHERE description = 'Boogie robot';
After your DELETE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | ‘2000-01-01 00:00:00’ | 0 | FALSE | asdf |
Cookie robot | 2 | ‘2000-01-01 00:00:02’ | 3 | TRUE | dfdf |
Cookie robot | 4 | ‘2000-01-01 00:00:02’ | 4 | FALSE | zxfd |
Boogie robot | 3 | ‘2000-01-01 00:00:02’ | 5 | TRUE | ewra |
So, while there may be just one record in your source database where description = Cookie robot
, there are two in your destination - an old version where _fivetran_deleted = TRUE
, and a new version where _fivetran_deleted = FALSE
.
We also de-duplicate rows before we load them into your destination. We use the _fivetran_id
field, which is the hash of the non-Fivetran values in every row, to avoid creating multiple rows with identical contents. If, for example, you have the following table in your source:
description | quantity |
---|---|
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Then your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | ‘2000-01-01 00:00:00’ | 0 | FALSE | asdf |
Deleted rowslink
We don’t delete rows from the destination, though the way for how we process deletes differs for tables with primary keys and tables without primary keys.
Change tracking
You cannot sync tables without a primary key because CT requires primary keys to record changes. Tables without primary keys are excluded from your syncs. They are still listed in your Fivetran dashboard, but appear disabled.
If you want to add a primary key to a table, you can run the following query in your primary database:
ALTER TABLE <schema>.<pkless_table> ADD _id int NOT NULL IDENTITY (1,1) PRIMARY KEY
The new _id
primary key column…
- Auto-generates a sequential
_id
value when a new record is added - Cannot be changed or overwritten with new values
- Automatically populates on all records when added to an existing table
Change data capture
We merge changes to tables without primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
- A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
- An UPDATE in the source table soft-deletes the existing row in the destination by setting
_fivetran_deleted = TRUE
, then inserts a new row with the updated values
Migrating Service Providerslink
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won’t retain the same change tracking data as your original SQL Server database.