Question
What is the optimal configuration of Databricks for Fivetran?
Environment
Destination: Databricks
Answer
It is recommended that you follow the recommendations below:
- Always configure your Databricks warehouse according to the Fivetran setup guide.
- Enable autoscale.
- Enable ‘Always On’ only if you utilize a five-minute sync frequency.
Context
Autoscaling is enabled because a Databricks cluster will weigh either towards Compute or Memory-Optimised depending on how long queries take per table; this is determined by the amount of data being synced per-table
‘Always On’ is only useful for low-frequency syncs because the Databricks cluster start time is relatively low (with a boot time of around 1 minute)
Considerations:
If you experience issues with Databricks performance and you believe this may be related to how Fivetran interacts with your destination, then please raise a ticket with the Fivetran Support team.