Fivetran connects to all of your supported data sources and loads the data from them into your destination. Each data source has one or more connectors that run as independent processes that persist for the duration of one update. A single Fivetran account, made up of multiple connectors, loads data from multiple data sources into one or more destinations.
System Architecture Overview Diagram
Fivetran connects to your data sources using our connectors. Fundamentally, there are two different types of connectors: push and pull.
Fivetran’s pull connectors actively retrieve, or pull, data from a source. Fivetran connects to and downloads data from a source system at a fixed frequency. We use an SSL-encrypted connection to the source system to retrieve data using a variety of methods: database connections via ODBC/JDBC, or web service APIs via REST and SOAP. In practice, the method or combination of methods is different for every source system.
Fivetran’s push connectors receive data that a source sends, or pushes, to them. In push connectors, such as Webhooks or Snowplow, source systems send data to Fivetran as events. As the events arrive, Fivetran stores them as JSON in a file bucket in a cloud storage service. Periodically, a process pulls the new events from that bucket and subsequently follows the same steps as a pull connector.
Ingest and Prepare Datalink
Once the connector process ingests the data query results, Fivetran normalizes, cleans, sorts, and de-duplicates the data. The purpose of the normalization and cleaning is to format the data in the optimal way for the destination. (Learn more about this optimization here.)
The Fivetran philosophy is to make a faithful replication of source data with as few transforms as necessary to make it useful. During the ingestion process, records may be temporarily buffered on disk, encrypted using a secret ephemeral key.
Load Data into Temporary Data Storagelink
Fivetran outputs the finalized records to a file in a file storage bucket. This file is encrypted with a separate ephemeral key that is known only to the process performing the write. This temporary file is automatically deleted after 24 hours via an expiration policy on the bucket. The bucket service depends on the destination.
Load Data into Destinationlink
Next, Fivetran copies the file into a staging schema in the destination. In the process, the ephemeral encryption key for the file is transmitted to the destination so it can decrypt the data as it arrives. Lastly, the updates in the staging schema are upserted into the destination table. The update is complete and the connector process terminates itself. A system scheduler will later restart the process for the next update.