Destination Improvement: Optimize the queries written by HVR to load data to Google BigQuery

We have some tables which has high volume of data and we were planning to do the partition of table so that complete table is not scanned during data load.
We analyzed the queries executed by Fivetran in BigQuery (e.g., DELETE, MERGE, UPDATE, etc.) and found that there was no significant column from the main table that could be used for partitioning.
However, we identified one potential candidate for partitioning the "sap_arch_deleted" column. This column is used in some of the queries generated by Fivetran, so we thought it could be effective for partitioning.
Upon further inspection, we found that this filter is applied in a subquery (as part of an EXISTS function), which prevents it from driving partitioning behavior on the main table.

Generated query from Fivetran:
delete from `zp-data-platform-stg.Quantiphi_UT.BSEG_4 bas_ where exists (select 1 from `sap-replication-prod.RAW_ECP.bseg__b` bur_ where (bas_.`MANDT`=bur_.`MANDT` or (bas_.`MANDT` is null and bur_.`MANDT` is null)) and (bas_.`BUKRS`=bur_.`BUKRS` or (bas_.`BUKRS` is null and bur_.`BUKRS` is null)) and (bas_.`BELNR`=bur_.`BELNR` or (bas_.`BELNR` is null and bur_.`BELNR` is null)) and (bas_.`GJAHR`=bur_.`GJAHR` or (bas_.`GJAHR` is null and bur_.`GJAHR` is null)) and (bas_.`BUZEI`=bur_.`BUZEI` or (bas_.`BUZEI` is null and bur_.`BUZEI` is null)) and bas_.`sap_arch_deleted`<>0 and bur_.`hvr_op`=1);

If we rewrite the query as below then partitions can be leveraged.
delete from `zp-data-platform-stg.Quantiphi_UT.BSEG_4` bas_
where
bas_.`sap_arch_deleted`<>0
AND exists (select 1 from `sap-replication-prod.RAW_ECP.bseg__b` bur_ where (bas_.`MANDT`=bur_.`MANDT` or (bas_.`MANDT` is null and bur_.`MANDT` is null)) and (bas_.`BUKRS`=bur_.`BUKRS` or (bas_.`BUKRS` is null and bur_.`BUKRS` is null)) and (bas_.`BELNR`=bur_.`BELNR` or (bas_.`BELNR` is null and bur_.`BELNR` is null)) and (bas_.`GJAHR`=bur_.`GJAHR` or (bas_.`GJAHR` is null and bur_.`GJAHR` is null)) and (bas_.`BUZEI`=bur_.`BUZEI` or (bas_.`BUZEI` is null and bur_.`BUZEI` is null)) and bur_.`hvr_op`=1);

Mark Van de Wiel User

November 27, 2024 13:01

Hi Samyak,

There is an argument that the BigQuery query optimizer should be smart enough to recognize the ability to use partitioning with the way HVR generated this query.

However, I also think that it would generally be cleaner if we uplifted the restriction on the driving table out of the sub query.

Let me discuss with engineering whether we can do this easily.

Thanks,
Mark.

September 29, 2025 02:54

FYI Samyak engineering made code changes that are currently in QA. I look forward to releasing this improvement soon.

Edwin Fivetranner

February 12, 2026 16:57

We have made progress with some engineering work, and it would be probably good to plan some time to discuss those. Let me know when you are open to that.

Best regards,

Edwin

Community

Destination Improvement: Optimize the queries written by HVR to load data to Google BigQuery

Comments