Fivetran connects to all of your supported data sources and loads the data from them into your destination. Each data source has one or more connectors that run as independent processes that persist for the duration of one update. A single Fivetran account, made up of multiple connectors, loads data from multiple data sources into one or more destinations.
System Architecture Overview Diagram
Fivetran connects to your data sources using our connectors. Fundamentally, there are two different types of connectors: push and pull.
Fivetran’s pull connectors actively retrieve, or pull, data from a source. Fivetran connects to and downloads data from a source system at a fixed frequency. We use an SSL-encrypted connection to the source system to retrieve data using a variety of methods: database connections via ODBC/JDBC, or web service APIs via REST and SOAP. In practice, the method or combination of methods is different for every source system.
- When we receive the events in our collection service, we first buffer the events in the queue.
- We store the event data as JSON in our cloud storage buckets. (For more information, see our data retention documentation.)
- During the sync, we push the data to your destination.
For more information on how sync works in our push connectors, see our Events documentation.
Ingest and Prepare Datalink
Once the connector process ingests the data query results, Fivetran normalizes, cleans, sorts, and de-duplicates the data. The purpose of the normalization and cleaning is to format the data in the optimal way for the destination. (Learn more about this optimization here.)
The Fivetran philosophy is to make a faithful replication of source data with as few transforms as necessary to make it useful.
Fivetran uses a queue to buffer the incoming source data. When destination load failures are caused by transient errors or destination unavailability, Fivetran’s pipeline doesn’t attempt to retrieve the data from the source that we already have in our queue. This limits the impact of destination outages and improves Fivetran reliability. When we find unprocessed data in the storage queue because of destination load failures, we process the pending queued data first.
During the ingestion process, we retain the buffered data that is encrypted at rest using a secret ephemeral key until we successfully load the data into the destination.
The ingestion processes run in parallel with the preparation and load processes. This strategy ensures that the destination data load process doesn’t block the source data ingestion process.
Load Data into Temporary Data Storagelink
Fivetran outputs the finalized records to a file in a file storage bucket. This file is encrypted with a separate ephemeral key that is known only to the process performing the write. This temporary file is automatically deleted after 24 hours via an expiration policy on the bucket. The bucket service depends on the destination.
Load Data into Destinationlink
From the temporary data storage, Fivetran copies the file into a staging schema in the destination. In the process, the ephemeral encryption key for the file is transmitted to the destination so it can decrypt the data as it arrives. Lastly, the updates in the staging schema are upserted into the destination table. The update is complete and the connector process terminates itself. A system scheduler will later restart the process for the next update.