Issue
There is a data mismatch between the Google Analytics UI and destination when the report contains a large amount of data.
Environment
Connector: Google Analytics
Resolution
A data mismatch could be seen in the Google Analytics UI due to data sampling, even though the data in the destination is exactly what was retrieved from the Google Analytics API.
How to verify if the data is being sampled in the Google Analytics UI
A green shield means your report is unsampled and contains all your data.
Yellow means you’re looking at sampled data. If you hover your mouse over the yellow shield icon, you’ll see the message shown in the screenshot below. This tells you how big the sample size is.
To verify if the Google Analytics UI data matches the destination, you should reduce the date range of your report to ensure you are not hitting the sampling threshold.
Cause
Sampling kicks in when you apply segments, secondary dimensions, or create custom reports.
Default reports are not subject to sampling but ad-hoc queries of your data are subject to the following general thresholds for sampling:
- Analytics Standard: 500k sessions at the property level for the date range you are using
- Analytics 360: 100M sessions at the view level for the date range you are using
-
Queries may include events, custom variables, and custom dimensions and metrics. All other queries have a threshold of 1M
-
This means if a standard GA customer is viewing a report in the GA UI which contains more than 500k sessions for the date range they're using, this data will be sampled. There will be a slight mismatch between the GA UI (sampled data) and the destination (unsampled data).
Since our requests pull the data from the reports on a per-day basis, there is less chance we are syncing across sampled data as we are less likely to be hitting the sampling thresholds.
However, it is possible we could be syncing across sampled data if the number of sessions exceeds the 500k/100M threshold for a single day.