Destination Improvement: Databricks destination should keep connector in a failed state if data inconsistency is detected following a retry.
PlannedWe opened a support case (# 334621) regarding a bug in which Fivetran's retry mechanism for the Databricks destination resulted in data inconsistency in the target table (specifically, records were marked as soft-deleted that still existed in the source table). The connector, however, remained in a "green" state, providing false assurance of source-to-target fidelity.
The specific mechanics of the bug scenario were:
- Fivetran issued its multi-step sequence of queries to Databricks to update an SCD2 table in the destination.
- The Databricks SQL warehouse responded with a 500 error to the MERGE query in the middle of this sequence.
- The Fivetran process initiated a retry sequence, but a different number of records were impacted during the INSERT step in the retry than during the initial attempt.
- Following the retry, due to this inconsistent behavior during the retry, several records were left in a soft-deleted state in the target SCD2 table that were still active in the source table.
Had the Fivetran process detected at step 3 (e.g., using operation metrics returned by Databricks) that the set of records modified by the retry did not match those modified during the initial attempt, the data inconsistency could have been flagged and the connector could have been left in a broken state to alert our team to investigate.
In general, it would be preferable for a connector to remain in a broken state if it detects data inconsistency so that we can (a) be aware of the inconsistency and (b) determine how best to fix the issue (which might entail a full re-sync of the affected table).
-
Hi Matt,
We will implement Databricks multi-statement transactions to properly handle insert and update failures in SCD2, which is what’s causing the data inconsistency you’re seeing.
Best regards,
Please sign in to leave a comment.
Comments
1 comment