Other: HVR/LDP - Snowflake Location - specify 2nd warehouse for "state" table updates
Hi,
We have noticed a significant credit consumption cost for Snowflake with locations setup with using larger warehouse sizes. The locations do require & benefit from larger warehouse size to handle consumption of large batches of data into Snowflake.
The larger warehouse size is required for the COPY INTO Burst table, and MERGE INTO steps. These SQL commands benefit from up-sized warehouse, but I would argue for other commands an XS warehouse would suffice.
Observing the "chat" in Snowflake Query history we can see a lot of quick SQL updates to HVR internal tables which do not benefit or need a larger warehouse size, & should be running against a lower cost XS warehouse, e.g.
update "HVR_ISMIG_NONBATCH_SFK_MUSBI_A" set "LAST_INTEG_END" = ?, "TRAILER_CAP_LOC" = ?, "TRAILER_CAP_BEGIN" = ?, "TRAILER_CAP_BEGIN_INT" = ?, "LEADER_CAP_LOC" = ?, "LEADER_CAP_BEGIN" = ?, "LEADER_CAP_BEGIN_INT" = ?, "SESSION_NAME" = ?
update "HVR_IBMIG_NONBATCH_SFK_MUSBI_A" set "IS_BUSY" = ?, "TBL_NAME" = ?, "HVR_OP" = ? where "PARALLEL_SESSION" = ?
update "HVR_ICMIG_FACT_SFK_MIG_FACT" set "BURST_BUSY_TX_SEQ" = ? where "HVR_CAP_LOC" = ?
select "PARALLEL_SESSION", "IS_BUSY", "TBL_NAME", "HVR_OP" from "HVR_IBMIG_NONBATCH_SFK_MUSBI_A" where parallel_session=0
Suggestion: Could we specify TWO warehouse settings in Snowflake Locations?
1. Warehouse for SQL Inserts , e.g. SFK_WH_Medium
2. Warehouse for HVR internal processes: e.g. SFK_WH_XSmall
I hope you consider this suggestion, as it would be well received by Snowflake customers :-)
Cheers
Terry
-
Hi Terry,
In line with the discussion on the forum on this topic I don't see this making sense some time soon.
If you have periods during the week with little or no load, then using an action Scheduling can make sense. E.g. run less frequently during off peak hours, or just use IntegrateStartAfterCapture can significantly reduce the amount of time that your Snowflake warehouse is available.
Hope this makes sense.
Thanks,
Mark. -
Hi Mark,
Thanks for your response. May be this specific "use case" of HVR replication we have is uncommon.
We need to leave HVR replication running 24/7 - as have no hard time schedules for processing large batches tied to a specific channel where we have allocated a larger medium sized warehouse.
I just wanted a method of reducing cost of "idle" synchronisation SQL calls driven by HVR background processes - consuming credits adequately served by an XS warehouse, as opposed to the REAL data ingestion tasks.
Hope that supplies more context.
Cheers
Terry
-
Hi Terry,
Thanks for that context.
Please note that the idle cycles don't happen when you configure action Scheduling with option IntegrateStartAfterCapture. With that setting - after activation - the integrate is kicked off by capture when new changes are captured. This is great for when you have periods during the week with no or almost no activity.
The only minor downside of not running the regular idle integrate cycles is that the recovery point in the destination state table does not move forward. In case of some failover scenario when you might re-activate the starting point of the channel based on the integrate state table it would lead to longer capture rewind. However, if there is anyway little activity on the database overall then the extra duration to capture the longer log history should be very limited.
The other option would be to lower the frequency of the idle integrate cycles. You could do this with a job environment setting at the system level. Use variable name HVR_GLOB_MAX_WAIT with value 120.00 to lower the frequency to once every 2 hours (from the default once every 10 minutes).
Hope this provides some options for you to lower your Snowflake bill.
Thanks,
Mark.
Please sign in to leave a comment.
Comments
3 comments