Issue
Data is duplicated in the warehouse after an S3 file connector sync.
Environment
Connector: Amazon S3
Resolution
Uploading multiple files with the same name to your S3 bucket results in duplicate data within the warehouse.
- Click your S3 connector from the connectors tab on the left side of your dashboard.
- Click Setup.
- Click Edit Connection Details.
- Set Modified File merge to upsert_file.
Cause
The upsert_file
option will replace records in the destination, using the filename and line number as the primary key. append_file
option will append records.
If append_file
is selected, and you upload the same file with a few modifications, Fivetran will duplicate all data for the second upload.
Should you need to modify a file, and re-upload it to your S3 bucket, setting your connector to use the upsert_file
option will ensure duplicates are not created in the warehouse.