Why is data duplicated in the warehouse after an S3 file connector sync?
Connector: Amazon S3
Uploading multiple files with the same name to your S3 bucket results in duplicate data within the warehouse.
- Click your S3 connector from the connectors tab on the left side of your dashboard.
- Click Setup.
- Click Edit Connection Details.
- Set Modified File merge to upsert_file.
upsert_file option will replace records in destination, using the filename and line number as the primary key.
append_file option will append records.
append_file is selected, and you upload the same file with a few modifications, Fivetran will duplicate all data for the second upload.
Should you need to modify a file, and 're-upload' it to your S3 bucket, setting your connector to use the
upsert_file option will ensure duplicates are not created in the warehouse.