BigQuery is Google’s serverless, petabyte-scale data warehouse for analytics. With BigQuery, you can run analytics over large amounts of data using standard SQL queries.
Fivetran can sync to BigQuery as frequently as every five minutes.
Setup guidelink
Follow our step-by-step BigQuery setup guide to connect your BigQuery data warehouse with Fivetran.
Type transformation mappinglink
The data types in your BigQuery data warehouse follow Fivetran’s standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type | Notes |
---|---|---|
BOOLEAN | BOOLEAN | |
SHORT | INTEGER | |
INT | INTEGER | |
LONG | INTEGER | |
FLOAT | FLOAT | |
DOUBLE | FLOAT | |
BIGDECIMAL | NUMERIC or BIGNUMERIC | For source connectors created after August 3, 2021, we convert BIGDECIMAL to BIGNUMERIC. |
LOCALDATE | DATE | |
INSTANT | TIMESTAMP | |
LOCALDATETIME | DATETIME | |
STRING | STRING | |
JSON | STRING | |
BINARY | BYTES |
Fivetran represents nested JSON (including arrays) as STRING columns in BigQuery. This happens with semi-structured data sources (for example: webhooks, MongoDB, JSON files).
Parse JSON datalink
Use BigQuery’s native JSON functions in Standard SQL to retrieve data stored in JSON-formatted strings.
For example, if you have a table in your BigQuery destination TEST.JSON_PARSING_EXAMPLE
(id INT64, json STRING), with the following nested JSON structure:
{
"id":1,
"name": {
"first_name": "john",
"last_name": "smith",
"full_name": "john_smith"
},
"addresses":[
{
"name": "home",
"city": "oakland",
"state": "california",
"country": "usa"
},
{
"name": "work",
"city": "denver",
"state": "colorado",
"country": "usa"
}
]
}
To parse the data:
-
Query the data. Execute the following commands:
select json_value (json, '$.id') as json_id, json_value (json, '$.name.first_name') as first_name from test.json_parsing_example;
Row json_id first_name 1 1 john -
Flatten the data. Execute the following commands:
select json_value(json, '$.id') as json_id, json_value(json, '$.name.first_name') as first_name, json_value(n, '$.name') as addresses_name, json_value(n, '$.city') as addresses_city, json_value(n, '$.state') as addresses_state, json_value(n, '$.country') as addresses_country from test.json_parsing_example, unnest (json_query_array (json, '$.addresses')) n;
Row json_id first_name addresses_name addresses_city addresses_state country 1 1 john home oakland california usa 2 1 john work denver colorado usa
Flatten and query JSON datalink
Fivetran supports JSON flattening. By default, the feature is not enabled on your destination because Fivetran doesn’t support automatic data type promotion from STRING to JSON.
Contact our support team to enable the feature on your destination. After we enable the feature, use your connector’s dashboard to drop and re-sync the tables.
For example, if you sync a source column json_col
with the following JSON format:
json_col {"id": 123, "nums" : [1, 2, 3, 4], "address": {"city":"ny", "pin":12345}}
We flatten the data and write the data in your destination as:
json_col.id | json_col.nums | json_col.address.city | json_col.adress.pin |
---|---|---|---|
123 | 1 | ny | 12345 |
2 | |||
3 | |||
4 |
Although, this is a single column of JSON data, you can execute queries like they are separate columns. For example:
select json_col.id, json_col.address.city from <datasetId>.<tableId>;
We apply our standard naming conventions to the JSON child column names. For example, we rename source columns such as 3h
and ABC Column
as _3_h
and abc_column
, respectively, in your destination.
You may observe longer sync durations for the tables because JSON columns consume more computation resources.
If we find non-JSON data type or corrupt data in the source column, we will convert the whole column to STRING type in the destination.
NOTE: Contact our support team if you want to disable JSON flattening on your destination.
Roles and permissionslink
The minimum role required for Fivetran to function is BigQuery User
. This role provides permissions to:
- run, enumerate, and cancel our own jobs (updates)
- enumerate all datasets within a project
- create new datasets within the project and grant the
bigquery.dataOwner
role for these new datasets
These permissions are required because Fivetran creates its own datasets and modifies them as changes are made to the source. These permissions do not allow Fivetran to modify datasets it does not own. Use a separate project for datasets that you do not want Fivetran to be able to query.
Data load costslink
Fivetran uses SQL queries to merge data into the tables in your data warehouse. Because BigQuery bills for query usage, they charge you when Fivetran loads data in your data warehouse. Partioned tables significantly reduce costs because they allow Fivetran to quickly identify where to insert data in a table without scanning every row.
Column nameslink
Fivetran ignores the case of column names in your destination tables as BigQuery is case-insensitive
Partition modification limitationslink
BigQuery limits the maximum number of partition modifications. When we detect a weak primary key in a table, we update the data in the destination table using the following steps:
-
Create a copy table that contains all the data of the main table.
-
Delete the rows that are present in the staging table
-
Copy with append all the data from the staging table to the copy table.
-
Copy with truncate all the data from the copy table to the main table.
NOTE: If the main table is a column partitioned table, the copy with truncate operation truncates the destination table and copies all data from the copy table to the main table. The operation modifies all the partitions in the main table.
If the sync fails with the QuotaExceededException
error, we delete the main table and then copy it back from the copy table to reset the partition quota for the day.
You can also resolve this error by:
- Increasing your BigQuery quota
- Reducing your sync frequency