The data ingested by my Google Analytics 360 (GA360) connector lands at sporadic times in my warehouse.
- To ensure data is ingested as soon as it is available, configure the sync frequency of the connector to be as short as possible
Our GA360 connector uses Google’s BigQuery export functionality to ingest GA360 data from BigQuery in batches. GA360 data is written to BigQuery in four batches per-day:
- One provides a full file of the previous days' data: this is called the Daily file
- Three are exported during the day with the current days' data: these are called IntraDay files
We read all files to ensure data integrity, but as a result of the batching from GA360 to BigQuery the minimum schedule you'd see data ingested from GA360 is every 6 hours.
We read all files; Fivetran uses the IntraDay tables to give you the data with as much completeness as possible throughout the day, but also use the Daily report to ensure the data is correct as-of the last report.
- One important consideration is that Fivetran will only read IntraDay tables up to 4GB in size; outside of this we will only ingest the FullDay file
Regarding the timing of your sync schedule within the Fivetran connector, there is never a certainty as to when the GA360 data will be available in BigQuery.
- The Intraday files are overwritten with each new one and the final Intraday file is then overwritten by the Full Day file, so there's never a potential conflict in the data being ingested
- Fivetran’s in-bult idempotency ensures that if a row appears in multiple files within the day (due to a long-running session), then that row will be updated in the warehouse with the latest data based on the last file ingested