Follow our setup guide to connect your AWS S3 bucket to Fivetran.
Prerequisiteslink
To connect your AWS S3 bucket to Fivetran, you need:
- An S3 bucket containing files with supported file types and encodings
- For private or encrypted buckets, an AWS account with the ability to grant Fivetran permission and to read from the bucket
Setup instructionslink
To authorize Fivetran to connect to your S3 bucket, follow these instructions:
IMPORTANT: If you are syncing from a public bucket, skip ahead to the Finish Fivetran configuration step. You don’t need an AWS account to sync from public buckets.
We recommend disabling Access Control Lists (ACLs) on each S3 bucket so that the bucket contents are controlled by the bucket’s access control settings and not the original file owner’s settings. For more information about disabling ACLs for your bucket, see AWS S3 documentation.
Get your External IDlink
In the connector setup form, find the automatically-generated External ID and make a note of it. You will need it to create an IAM role in AWS.
NOTE: The automatically-generated External ID is tied to your account. If you close and re-open the setup form, the ID will remain the same. You can keep the tab open in the background while you configure your source for convenience.
Create an IAM Policy for Fivetranlink
-
Open your Amazon IAM console.
-
Go to Policies, then select Create Policy.
-
Go to the JSON tab.
-
Copy the following policy and paste it in the JSON tab, replacing
{your-bucket-name}
with the name of your S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:::{your-bucket-name}/*",
"arn:aws:s3:::{your-bucket-name}"
]
}
]
}
-
For encrypted buckets, follow Amazon’s AWS S3 bucket instructions to modify the AWS KMS key’s policy to grant Fivetran permissions to download files from your encrypted bucket.
-
Click Review Policy.
-
Name the policy “Fivetran-S3-Access”.
-
Click Create Policy.
Create an IAM role for Fivetranlink
-
Go to Roles, then select Create New Role.
-
Select Another AWS Account, then enter Fivetran’s account ID,
834469178297
, in the Account ID field. -
Select the Require external ID checkbox.
-
Enter the External ID you found in your connector setup form.
-
Click Next: Permissions.
-
Select the “Fivetran-S3-Access” policy that you created in Step 2.
-
Click Next: Tags, which is optional.
-
Click Next: Review.
-
Name your new role “Fivetran”, then click Create Role.
-
Select the Fivetran role you just created.
-
Find the Role ARN and make a note of it. You will need it to fill in your connector setup form.
Permissions (Optional)link
You can specify permissions for the Role ARN that you designate for Fivetran. Giving selective permissions to this Role will allow Fivetran to only sync what it has permissions to see.
(Optional) Configure AWS PrivateLink BETAlink
IMPORTANT: You must have a Business Critical plan to use AWS PrivateLink.
AWS PrivateLink allows VPCs and AWS-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. Learn more in AWS’ PrivateLink documentation.
Follow our AWS PrivateLink setup guide to configure PrivateLink for your S3 bucket.
NOTE: There are two ways in which you can provide Fivetran access to your data, using IAM policies to control access to S3 buckets(recommended) or using access points.
(Optional) Configure Access Point link
Create access point
-
Create an access point to provide Fivetran access to your S3 bucket.
-
Open your Amazon IAM console.
-
On the left navigation pane, click Access Points.
-
Select the access point.
-
Go to the Properties tab. Make a note of the Access Point alias. You will need it to configure Fivetran.
Copy the following in your bucket policy to give access of your bucket to the access point.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{account-number}:role/{role-name}"
},
"Action": "[
"s3:Get*",
"s3:List*"
]",
"Resource": [
"arn:aws:s3:::{your-bucket-name}",
"arn:aws:s3:::{your-bucket-name}/*"
],
"Condition": {
"StringLike": {
"s3:DataAccessPointArn": "arn:aws:s3:us-west-2:{account-number}:accesspoint/{your-access-point}"
}
}
}
]
}
Create IAM Policy for access point
Follow the steps in Step 2 and Copy the following policy and paste it in the JSON tab, replacing {your-access-point}
with the name of the access point you created.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:us-west-2:{account-number}:accesspoint/{your-access-point}",
"arn:aws:s3:us-west-2:{account-number}:accesspoint/{your-access-point}/*"
]
}
]
}
Finish Fivetran configuration link
-
In the connector setup form, enter the Destination schema name of your choice.
-
Enter your Destination table name.
-
Enter your S3 Bucket name.
IMPORTANT: If you are using an access point, enter the Access Point alias if you already have it or create using Our Configure Access Point guide.
-
If you are syncing from a private bucket, enter the Role ARN you found in AWS.
-
(Optional) If you are syncing from a public bucket, set the Public? toggle to ON.
-
Choose your configuration options. Using these configuration options, you can select subsets of your folders, specific types of files, and more to sync only the files you need in your destination. In addition, setting up multiple connectors targeted at the same container but with different options allows you to slice and dice a container any way you’d like.
You can use the following configuration options:
-
(Optional) Folder Path - Use the folder path to specify a portion of the container in which you’d like Fivetran to look for files. We examine files under the specified folder and all of its nested subfolders for files we can sync. If you don’t provide a prefix, we’ll look through the entire container for files to sync.
-
(Optional) File Pattern - Use a regular expression as the file pattern to decide whether or not to sync specific files. The pattern applies to everything under the prefix (folder path). If you’re unsure what regular expression to use, you can leave this field blank, and we’ll sync everything under the prefix.
For example, if under the prefix
logs
, you have three folders:2017
,2016
, anderrors
. Using the pattern\d\d\d\d/.*
, you can exclude all the files in theerrors
folder because:\d\d\d\d
only applies to the folders whose name consists of four consecutive digits, and.*
after/
applies to any files in these folders
TIP: You can learn to write your regex and test it out.
-
File Type - Use the file type to choose the parsing strategy for files without file extensions. If you save your files with improper extensions, you can force them to be synced as the selected file type.
-
If you select infer, we infer the type from a file’s extension (.csv, .tsv, .json, .avro, or .log).
-
If you choose a file type, we interpret every file we examine as the file type you select, so make sure everything we sync has the same file type.
For example, if you have an automated CSV output system that saves files without a .csv extension, you can specify the type as csv, and we will sync them correctly as CSVs.
-
-
Compression - Use the compression option to choose the compression strategy to decompress files without compression extensions. If your files are compressed but do not have extensions indicating the compression method, you can decompress them according to the selected compression algorithm.
-
If all of your compressed files are correctly marked with a matching compression extension (.bz2, .gz, .gzip, .tar, or .zip), you can select infer.
-
If you select uncompressed, we do not decompress the files and sync the uncompressed files.
-
If you choose a compression format, we decompress every file using the format you select.
For example, if you have an automated CSV output system that GZIPs files to save space but saves them without a .gzip extension, you can set this field to GZIP. We will decompress every file that we examine using GZIP.
-
-
Error Handling - Use the error handling option to choose how to handle errors in your files. If you know that your files contain some errors, you can choose to skip poorly formatted lines.
-
If you select skip, we ignore improperly formatted data within a file, allowing you to sync only valid data.
-
If you select fail, we do not sync a file if we detect improperly formatted data in the file.
TIP: We recommend that you select fail unless you are sure that you have undesirable, malformed data.
You will receive a notification on your Fivetran dashboard if we encounter errors.
-
-
-
(Optional) To use the advanced configuration options, set the Enable Advanced Options toggle to ON.
You can use the following configuration options for specific use cases:
-
Modified File Merge - Use this option to let Fivetran know how to update files in the destination. When you modify a previously synced file, should we replace the rows in the destination table or append the new rows to the table:
-
upsert_file replaces records in destination, using the filename and line number as the primary key.
-
append_file appends records.
-
-
(Optional) Archive Folder Pattern - Use a regular expression to filter and sync files from archived folders. We sync the files in compressed archives with filenames matching the specified pattern. If there are multiple files within archive (TAR or ZIP) folders, you can use the archive folder pattern to filter file types.
For example, if you specify the archive folder pattern as
.*json
, we will sync only the files that end in a .json file extension from the archive folder. -
(Optional) Null Sequence - Specify the value indicating null if your CSVs use a special value indicating null.
Only use this field if you are sure your CSVs have a null sequence. CSVs have no native notion of a null character. However, some CSV generators have created one, using characters such as
\N
to represent null.TIP: The text is un-escaped before the null sequence is matched, so don’t use the escape character in your null sequence.
-
(Optional) Delimiter - Specify the delimiter. The delimiter is a character used in CSV files to separate one field from the next. Fivetran tries to infer the delimiter, but in some cases, this is impossible. If your files sync with the wrong number of columns, consider setting this value.
-
If you leave this field blank, we infer the delimiter for each file. You can store files of many different types of delimiters in the same folder with no problems.
-
If you specify a delimiter, we parse all the CSV files in your folder path with this delimiter.
-
-
(Optional) Escape Character - Set the escape character if your CSV generator follows non-standard rules for escaping quotation marks.
Only use this field if you are sure your CSVs have a different escape character. CSVs have a special rule for escaping quotation marks compared to other characters; they require two consecutive double quotes to represent an escaped double quote. However, some CSV generators do not follow this rule and use different characters like backslash for escaping.
-
(Optional) Skip Header Lines - Use this option to skip over fixed-length headers at the beginning of your CSV files.
Some CSV-generating programs include additional header lines at the top of the file. The header consists of a few lines that do not match the format of the rest of the rows in the file. These header rows can cause undesired behavior because we attempt to parse them as if they were records in your CSV.
-
(Optional) Skip footer Lines - Use this option to skip over fixed-length footers at the end of your CSV files.
Some CSV-generating programs include a footer at the bottom of the file. The footer consists of a few lines that do not match the format of the rest of the rows in the file. These footer rows can cause undesired behavior because we attempt to parse them as if they were records in your CSV.
-
(Optional) Headerless Files - Set the toggle to ON if your CSV-generating software doesn’t provide a header line for the documents. Fivetran can generate the generic column names and sync data rows with them.
Some CSV-generating programs do not include column name headers for the files; they only contain data rows. When you set the toggle to ON, we generate generic column names following the convention of
column_0
,column_1
, …column_n
to map the rows. -
(Optional) List Strategy - Select the listing strategy you want to use:
-
complete_listing - The default option, where we list all the new and modified files from the bucket.
-
time_based_pattern_listing - You can opt to use this strategy if your files are named based on the date or time they are added to the bucket. If you add new files in lexicographic order to the bucket, in each sync, we try to identify a time-based pattern. We only list and sync the files that are lexicographically greater than the last file synced in the previous sync.
NOTE: If we are unable to identify a time-based pattern, we use the default option.
-
-
(Optional) To always connect using AWS PrivateLink, set the Require PrivateLink toggle to ON.
NOTE: By default, we use PrivateLink to connect if your S3 bucket and destination are in the same region. Enabling this option ensures that we always use PrivateLink to connect. If the regions are different, Fivetran won’t create the connection.
-
-
Click Save & Test. Fivetran will take it from here and sync your data from your AWS S3 bucket.
Related Contentlink
description Connector Overview
settings API Connector Configuration