The data that your company’s software services provide can offer much more insight than what you can access on their internal dashboards.
Many of these services offer APIs (Application Programming Interface) for accessing or extracting data from them using a direct and secure internet connection. Fivetran’s integration platform continuously extracts the underlying data from your existing applications and centralizes it all in your cloud warehouse, database, or data lake. For more information about data storage platforms that we support, see our Destinations documentation.
Fivetran supports replicating data from multiple API Cloud Applications. For a full list of supported APIs, see the list to the left. If there is an integration that you would like but that is not yet supported, please let us know.
- Connect to service using OAuth if available. OAuth or Open Authorization lets you connect with Fivetran by directly logging into the application you’re connecting with. It grants us restricted access to your account, and protects your sensitive information. We use OAuth for all APIs that support it.
- Initial Dump of Data: Fivetran discovers all available standard and custom objects and automatically pulls data for all objects that it has access to. The initial sync time differs for each connector type. It can be anywhere from 1 hour to 1 month, depending on the application’s API limits.
- Transform & Map Schema: Fivetran parses through all data, typecasting and mapping every column in the source object to a column in a corresponding destination table. We transform any data types not natively supported by the destination into supported data types. Fivetran does not do any aggregations at this step.
- Load: Fivetran automatically creates schemas (one per connector) and tables within schemas for each mapped source object. Fivetran populates these tables with the initial dump of data.
- Update: Fivetran incrementally updates each connector in batches, using a merge operation (upsert & insert). We only update changed or new data. These batches run on different time intervals. The connector sync frequency ranges from every 5 minutes to 24 hours and can be set for each connector. Our unique system automatically recognizes schema changes in the source, and persists these changes to the target.
Every API integration generates a schema with two types of tables:
- Standard tables are predetermined database tables and are common across all organizations.
- Custom tables are database tables that allow you to store information unique to your organization.
Standard tables can also generate custom fields, which are specific to an organization. We pursue a default sync-all strategy and we try to sync every table (custom or standard) and every field (custom or standard) that we possibly can. We do not sync custom tables for a few APIs, but for most APIs, we sync custom fields.
There are some data types, for example, binary data, that it does not make sense to store in a SQL-based data warehouse, and so Fivetran actively does not sync this type of data. For a full list of the data that is synced for each application connector, visit the individual connector documentation page.
For every schema, we create a fivetran_audit table that provides you metadata about each table in the schema.
We pursue a “sync-all” strategy, syncing as many source tables or objects as possible using as little setup and configuration as possible. All Fivetran application connectors are pull connectors. Fivetran periodically pulls new or changed data from the source.
A single API connection (shown below in the blue connection icon circle) results in a single schema, with multiple tables. For example, connecting with your Salesforce API will create a Salesforce schema (with Salesforce tables) in your destination. You cannot sync two different APIs or schemas into a single schema.
For application connectors, you can specify a name for the schema in the first step of the connector setup. This will become the name of the connector, and we will load every table from the source into this schema. You cannot, however, specify table and column names. Fivetran will auto-generate these names from the names of corresponding objects. See our Naming Conventions documentation for the details about the naming conventions we use when generating the table and column names.
Excluding Source Datalink
Depending on whether an API allows you to do it, we give you the option of not syncing specific tables or columns with Fivetran. You can block columns on the Schema tab of your dashboard.
If data in the source changes (e.g., you add new columns, custom fields, or change a data type), Fivetran automatically detects and persists these schema changes into your destination. After the initial sync, Fivetran incrementally pulls updates of new or changed data from the source. To make these incremental syncs, Fivetran maintains an internal set of progress cursors for every table that we sync. Fivetran only records successful progress to the cursors when an update is successfully loaded into the destination. This provides an air-tight handoff between syncs so that no data is ever missed. Because of this, our system is extremely tolerant to service interruptions. If there is an interruption in service, such as your destination being compromised, after your destination is live again (even days or weeks later), Fivetran will automatically resume syncing exactly where it left off.
For most application connectors, Fivetran performs a change data capture (CDC) strategy for pulling changes from the service API. The specific column that Fivetran uses to track the CDC varies for individual services, but is often based on a last modified data column. Because Fivetran does not receive every change to a row, but only the deltas of a changed row between syncs, we don’t support a snapshotting of the data. Rather, Fivetran supports a model of Eventual Consistency.
By default, application connectors sync all new and modified data every 6 hours. Depending on the size of each update, it may take longer. In that case, the next sync will at the next 6 hours interval, that is:
Sync X Start: 6:00am
Sync X Finish: 13:18pm
Sync Y Start: 18:00pm
You can change the sync frequency on the Setup tab of your Fivetran dashboard. See our Sync Frequency and Scheduling docs for details. We repeatedly re-try a connector sync that encounters an error after either 1 hour or the set sync frequency period, whichever is shorter. See our Sync Start Times and Offsets documentation for the details on sync start times for failed syncs.
In general, Fivetran will never delete data from your destination. Fivetran handles deleted data differently for different connectors.
For connectors where the API supports it, we create an extra column in each table. This column is named
is_deleted and has data type BOOLEAN. We use this column to detect if rows were deleted in the source object. We mark the row as
TRUE if it is deleted in the source.
Some APIs do not allow us to detect or indicate when data has been deleted in the source.