The data that your company’s software services provide can offer much more insight than what you can access on their internal dashboards.
Many of these services offer APIs (Application Programming Interface) for accessing or extracting data from them via a direct and secure internet connection. Fivetran’s integration platform continuously extracts the underlying data from your existing applications and centralizes it all in a petabyte-scale data warehouse.
Fivetran supports replicating data from several API Cloud Applications. For a full list of supported APIs see the list to the left. If there is an integration that you would like but that is not yet supported, please let us know.
- Connect to service using OAuth if available. OAuth or Open Authorization lets you connect with Fivetran by directly logging into the application you’re connecting with. It grants us restricted access to your account, and protects your sensitive information. We use it for all APIs that support it.
- Initial Dump of Data: Fivetran discovers all available standard and custom objects available, and automatically pulls data for all objects that it has access to. The initial sync time differs for each connection type, but can take anywhere from 1 hour to 1 month, depending on the limits an API has.
- Transform & Map Schema: Fivetran parses through all data, typecasting and mapping every column in the source object to a column in a corresponding destination table. We transform any data types that are not natively supported in the destination, into data types that are accepted. Fivetran does not do any aggregations at this step.
- Load: Fivetran automatically creates Schemas (one per integration) and tables for each mapped source object. Fivetran populates these tables with the initial dump of data.
- Update: Fivetran incrementally updates each integration in batches, using a merge operation (upsert & insert), and only updating the changed or new data. These batches run on different time intervals per integrations ranging from every 1 hour to 24 hours. Fivetran’s unique system automatically recognizes schema changes in the source, and persists these changes to the target.
Every API integration generates a schema with two types of tables:
- Standard tables are predetermined database tables and are common across all organizations.
- Custom tables are database tables that allow you to store information unique to your organization.
Standard tables can also generate custom fields, which are specific to an organization. We pursue a default sync-all strategy and we try to sync every table (custom or standard) and every field (custom or standard) that we possibly can. We do not sync custom tables for a few APIs, but for most APIs, we sync custom fields.
There are some data types, for example, binary data, that does not make sense to be stored in a SQL-based data warehouse, and so Fivetran actively does not sync this type of data. For a full list of the data that is synced for each cloud application integration, please visit the individual API Cloud Integration page.
For every schema we create a fivetran_audit table that gives you metadata about each table.
We pursue a “sync-all” strategy, syncing as many source tables or objects as possible via as little setup and configuration as possible. All Fivetran API cloud connections are pull integrations. Fivetran periodically pulls new or changed data from the source.
A single API connection (shown below in the blue connection icon circle) results in a single schema, with multiple tables. For example, connecting with your Salesforce API will create a Salesforce schema (with Salesforce tables) in your destination. You cannot sync two different APIs or schemas into a single schema.
For cloud application connections, users can decide the name of the schema in the first step of the integration. This will become the name of the integration and we will load every table from the integration into this schema. You cannot, however, decide the name of tables. Fivetran will auto-generate these names from the names of corresponding objects. When we name your tables, we follow these conventions:
- convert all upper-case letters to lower-case
- replace a space with an underscore: “Some Name” becomes some_name
- separate two words that are joined, with an underscore: “anotherName” becomes another_name
- convert a double underscore to single underscore
- remove any underscore prefix
|Table Names||Table Name Conversions|
These same conventions are applied to column names.
Excluding Source Datalink
Depending on whether an API allows you to do it, we give you the option of not syncing specific tables with Fivetran. This can be done on the Dashboard while you are integrating.
If data in the source changes (e.g. you add new columns, custom fields, or change a data type) Fivetran automatically detects and persists these schema changes into your destination. After the initial load of data, Fivetran pulls updates of new or changed data from the source. To make these incremental updates, Fivetran maintains an internal set of progress cursors for every table that we sync. Fivetran only records successful progress to the cursors when an update is successfully loaded into the destination. This provides an air-tight handoff between syncs so that no data is ever missed. Because of this Fivetran’s system is extremely tolerant to service interruptions. If there is an interruption in service, such as your destination being compromised, after your destination is live again (even days or weeks later) Fivetran will automatically resume syncing exactly where it left off.
For most cloud application connections, Fivetran performs a change data capture (CDC) strategy for pulling changes from the service API. The specific column that Fivetran uses to track the CDC varies for individual service, but is often based on a last modified data column. Because Fivetran does not receive every change that a row has, but only the deltas of a changed row between syncs, Fivetran does not support a snapshotting of the data, but rather supports a model of Eventual Consistency.
By default, cloud integrations sync all new and modified data every 15 minutes. Depending on the size of each update, it may take slightly longer. In that case, the integration will update at the next 15 minute interval, i.e.
Update X Start: 9:00am
Update X Finish: 9:18am
Update Y Start: 9:30am
You can change the update interval in the Dashboard. A cloud integration sync that encounters an error is re-tried repeatedly after the shorter of the sync frequency or 1 hour.
In general, Fivetran will never delete data from your destination. Fivetran handles deleted data differently for different integrations.
In integrations where the API supports it, we create an extra column in each table:
- “is_deleted” (boolean) to detect if rows were deleted in the source object. We mark the row as ‘TRUE’ if it is deleted in the source.
Some APIs’ do not allow us to detect or indicate when data has been deleted in the source.