File connectors currently have no way to infer the primary key from the file itself and use that to intelligently de-duplicate and merge data into the destination. As a result, our tables have many duplicate rows and none of the workaround solutions are ideal. What has been considered:
1. Create a View to Pull Distinct IDs Using max(_fivetran_synced)
This causes latency every time the view is queried by downstream applications (ex: Looker report).
2. Re-Create Table Every Time New Data Is Loaded
This becomes increasingly slow and expensive as our table continues to get bigger and bigger.
3. Use a MERGE statement in the destination into a separate "prod" table
This is not scalable, as the MERGE statement will fail if new columns are added to the table.
Proposal: Allow users to define a primary key column in the file itself and Fivetran automatically uses that to handle the deduplication and performing UPDATES to existing rows in the destination table.