Connector Improvement: Databricks Historical table sync using single merge command
For databricks connector DELTA tables, currently fivetran performs 4 steps
- DELETE
- MERGE
- OPTIMIZE
- WRITE
between step 1 and step 4 there may be several minutes and the historical table is in a transient state. if any queries hit the table during this period, the results will be incomplete.
I believe these can be combined into a single transactional MERGE statement (I'm not sure why there are delete/write steps). This is for Databricks DELTA tables.
-
Official comment
Hi Omar,
Thanks for reaching out! We are investigating the use of a single staging table for the Databricks connector where we'd load into the staging table and then use a single MERGE operation for that staging table. These improvements are on our short term roadmap!
Would you be able to describe your use case further? How are you querying the data?
-
+1 for this feature Omar. Thanks for the response Coral, glad to hear this is on the roadmap. The big issue for us is we have different tables in an upstream SQL source that are committed in single transaction, we dont see this down stream, so if other users are querying this data as "Bronze" data in Delta/DBX for reporting, they may see inconsistent results (for example account balances may not align with aggregate transactions table).
Please sign in to leave a comment.
Comments
2 comments