Follow our setup guide to connect Databricks to Fivetran.
IMPORTANT: If you used Databricks Partner Connect to set up your Fivetran account, you don’t need to follow the setup guide instructions because you already have a connection to Databricks.
Prerequisiteslink
To connect Databricks to Fivetran, you need the following:
- a Databricks account
- a Databricks administrator account if you chose to connect using a Databricks SQL endpoint
- a Fivetran account with permission to add destinations
You can connect Databricks to Fivetran using one of the following:
- A Databricks cluster (version 7.0-10.x). For setup instructions, see Connect Databricks cluster.
- A Databricks SQL endpoint. For setup instructions, see Connect SQL endpoint.
Setup instructionslink
Connect Databricks clusterlink
To connect to a Databricks cluster, do the following:
Create a Databricks cluster
-
Log in to your Databricks account.
-
In the Databricks console, go to Data Science & Engineering > Create > Cluster.
-
Enter a Cluster name of your choice.
-
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
-
In the Advanced Options section, select Spark.
-
In the Spark Config field, paste the following code:
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl.disable.cache true spark.hadoop.fs.s3.impl.disable.cache true spark.hadoop.fs.s3a.impl.disable.cache true spark.hadoop.fs.s3.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
-
In the Environment Variables field, paste the following code:
AWS_STS_REGIONAL_ENDPOINTS="{CloudServiceRegionMapping}"
This variable is used for the internal S3 bucket that Fivetran uses to stage your data before writing it into your Databricks cluster. The cloud service region parameter ensures that the correct region is used for this staging bucket.
-
Depending on the cloud service region, specify the storage container region. For example,
AWS_STS_REGIONAL_ENDPOINTS="us-east-1"
. See the following mapping table for details:Cloud Service Region Storage Container Region us-east4 us-east-1 us-east-1 us-east-1 eastus2 us-east-1 us-east-2 us-east-2 us-west-1 us-west-1 us-west-2 us-west-2 eu-west-1 eu-west-1 eu-west-2 eu-west-2 eu-west-3 eu-west-3 eu-central-1 eu-central-1 eu-north-1 eu-north-1 ap-south-1 ap-south-1 ap-southeast-1 ap-southeast-1 ap-southeast-2 ap-southeast-2 ap-northeast-1 ap-northeast-1 ap-northeast-2 ap-northeast-2 sa-east-1 sa-east-1 ca-central-1 ca-central-1 us-west1 us-west-2 europe-west2 eu-west-2 europe-west3 eu-central-1 northamerica-northeast1 ca-central-1 asia-southeast1 ap-southeast-1 australia-southeast1 ap-southeast-2 -
Click Create Cluster.
Get credentials
-
In the Advanced Options window, select JDBC/ODBC.
-
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Create a personal access token step.
Connect SQL endpointlink
To connect to a SQL endpoint, do the following:
Create a Databricks SQL endpoint
-
Log in to your Databricks account.
-
In the Databricks console, go to SQL > Create > SQL Endpoint.
-
In the New SQL Endpoint window, enter a Name for your endpoint.
-
Choose your Cluster Size, configure other endpoint options, and then click Create.
Configure data access
-
In the Databricks console, click Settings > SQL Admin Console.
-
In the Settings window, select SQL Endpoint Settings.
-
(Optional) If you are using an external data storage, select the Instance Profile.
-
In the Data Access Configuration box, paste the following code:
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl.disable.cache true spark.hadoop.fs.s3.impl.disable.cache true spark.hadoop.fs.s3a.impl.disable.cache true spark.hadoop.fs.s3.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
-
Scroll-down and click Save changes.
Get credentials
-
In the SQL console, click SQL Endpoints, and select the SQL endpoint you created in Step 1.
-
Go to the Connection Details tab.
-
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Create a personal access tokenlink
Follow Databricks’ token management guide to create a new personal access token.
Complete Fivetran configurationlink
-
Log in to your Fivetran account.
-
Go to the Manage Account page.
-
In the Destinations tab, click +Destination.
-
On the Add Destination To Fivetran page, enter a Destination Name of your choice.
-
Click Add Destination.
-
Depending on your infrastructure, select Databricks on AWS, Databricks on Azure, or Databricks on Google Cloud as the destination type.
-
Enter the Server Hostname.
-
Enter the Port number.
-
Enter the HTTP Path.
-
Enter the Personal Access Token you created.
-
(Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can:
-
Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path. -
Use the default Databricks File System location registered with the cluster. Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path.
-
-
Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and AWS region as described in our Destinations documentation.
-
Choose your Timezone.
-
Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the Databricks destination.
Setup testslink
Fivetran performs the following Databricks connection tests:
-
The Connection test checks if we can connect to the Databricks cluster through Java Database Connectivity (JDBC) using the credentials you provided in the setup form.
-
The Check Version Compatibility test verifies the Databricks cluster version’s compatibility with Fivetran.
-
The Check Cluster Configuration test validates the Databricks cluster’s environment variables and the spark configuration for standard clusters.
-
The Validate Permissions test checks if we have the necessary READ/WRITE permissions to
CREATE
,ALTER
, orDROP
tables in the database. The test also checks if we have the permissions to copy data from Fivetran’s external AWS S3 staging bucket.NOTE: The tests may take a couple of minutes to finish running.
Related Contentlink
description Destination Overview
settings API Destination Configuration