Now that support for BigQuery table clustering has been added to Fivetran, I would suggest that new tables for common integrations be created with clustering enabled on columns that are part of the table primary keys by default. This would greatly reduce the number of bytes scanned and resulting costs when table updates MERGE queries are run. Because Fivetran already documents these connector schemas, that would serve as a the initial reference for implementing this (and connector usage could be used to prioritize rollout).
While this may not make sense for every connector type, a default common-sense option during the creation or editing of the connection would help control against costs without the customer needing to know the specifics of BigQuery, at least for common connectors.
For append-only style tables with date-based keys, additional cost benefits could be added by having an option for default table partitioning—even if this had to be done at wee/month/year granularity to avoid hitting the partition count limit in BigQuery. This means that unlike currently, these tables would be able to benefit from the cost savings associated with BigQuery's automatic long-term storage that kicks in when a table hasn't been modified in 90 days. Details on time-unit partitioning (recently moved to GA) are available here.
Since this style of clustering and partitioning would be mostly transparent as it applies to most queries Fivetran runs on the warehouse and BigQuery automatically handles the optimization side, I wouldn't expect there to be a significant engineering effort in other parts of the applications post-setup.
If there are concerns for these types of changes affecting customers with corner cases, having an option on the connection details screen both when first creating and new connection and and editing an existing connection to choose performance/cost optimized schema (clustered and/or partitioned) vs. compatibility schema (standard table) would be a potential solution. I could also see this as being something you could choose a default for when adding/editing a destination warehouse.
As always, it makes sense to allow users to customize their destination schema to meet their needs—but a sane default that reduces costs and improves performance would be a great user experience boost and help avoid warehouse I/O cost surprises that aren't easy to predict. And one that is platform managed is preferable to the user being required to go and manually update each of their tables using the existing pause/copy/delete/copy/unpause process.