How do I manage my usage costs? Why is Fivetran using large
MERGE queries that impact my data load costs? Queries such as:
MERGE `connector`.`event` AS `#existing` USING `fivetran_name_staging`.`temp_table_name_here`
UPDATE queries in your warehouse for data de-duplication.
To reduce your BigQuery usage costs, do the following:
Set your connector’s sync frequency to as infrequent as you can while still meeting your business needs. For example, a connector syncing every five minutes will have a much higher warehouse query cost than one syncing every six hours as the merge and update queries run for every incremental sync.
Optimize (clean) tables in your source and destination. If there is old data you don’t need in your destination, delete the data from your destination to limit the amount of data we scan for the
MERGEqueries. If you are using a database connector, remove old records from the source so that you don’t sync the old data if you trigger a historical or table re-sync.
If your connector supports selecting specific tables in the Schema tab, deselect unnecessary tables or endpoints from the sync. Some database connectors allow you to split large tables into individual connectors. You can set a different sync frequency for the larger tables.
Use BigQuery to partition your larger tables so that our
MERGEqueries scan less overall data. For more information about BigQuery partitioning, see the following resources:
Fivetran uses SQL queries to merge data into the tables in your BigQuery data warehouse. BigQuery bills for query usage and charges you when we load data in your data warehouse.
UPDATE queries for data de-duplication. We pull incremental updates from the source. However, we need to de-duplicate the data by referencing it against the complete data in the destination to ensure data integrity. Our sync process creates temporary tables for this purpose, and BigQuery counts queries to these temporary tables as usage. For more information, see our data load costs documentation.