Why is Fivetran using these large merge queries? The merge query starts with this:
MERGE `connector`.`event` AS `#existing`
Guidelines for reducing BigQuery usage:
- Set the connector sync frequency to as infrequent as you can while still meeting your business needs.
- For example, a connector syncing every 5 minutes will have a much higher warehouse query cost than one syncing every 6 hours as the merge and update queries need to run for every incremental sync.
- Cleanup large tables in your source and destination. If there is old data you don't need in your warehouse, you should delete that from your destination to limit how much data is scanned for merge queries.
- If using a database connector, removing old records form the source is advised so we don't sync the old data if you initiate a table / historical resync.
- If the connector support selecting specific tables in the schema tab, disable unnecessary tables / endpoints.
- Some connectors such as database connectors allow you to split large tables out into their own connector so the larger table has a different sync frequency.
- Use BigQuery to partition your larger tables so the merge query scans less overall data.
Additional reading about BigQuery paritioning:
The merge and update queries are normal as those are for deduplication of data.
We'll pull only incremental updates from the source, but we need to deduplicate that against everything in the destination warehouse to ensure data integrity.
Our sync process creates temporary tables for this purpose, and BigQuery counts queries to those as usage.