File connectors currently have no way to infer the primary key from the file itself and use that to intelligently de-duplicate and merge data into the destination. As a result, our tables have many duplicate rows and none of the workaround solutions are ideal. What has been considered:
1. Create a View to Pull Distinct IDs Using max(_fivetran_synced)
This causes latency every time the view is queried by downstream applications (ex: Looker report).
2. Re-Create Table Every Time New Data Is Loaded
This becomes increasingly slow and expensive as our table continues to get bigger and bigger.
3. Use a MERGE statement in the destination into a separate "prod" table
This is not scalable, as the MERGE statement will fail if new columns are added to the table.
Proposal: Allow users to define a primary key column in the file itself and Fivetran automatically uses that to handle the deduplication and performing UPDATES to existing rows in the destination table.
Please sign in to leave a comment.