Amazon DocumentDB is a fully-managed NoSQL database service that is built for JSON data management and integrated with AWS.
Supported configurationslink
Fivetran supports the following DocumentDB configurations:
Supportability Category | Supported Values |
---|---|
Database versions | 4.0+ |
Maximum throughput * | 5.0 MBps |
Connector limit per database | 3 |
* Maximum throughput is your connector’s end-to-end update speed, measured in megabytes per second (MBps). We calculate the maximum throughput by averaging the number of rows synced per second during your connector’s last 3-4 syncs. To learn more about sync speed, see the Replication speeds section.
Network protocol | Supported Versions | Notes |
---|---|---|
Transport Layer Security (TLS) | TLS 1.0, TLS 1.1, TLS 1.2 | We can only support TLS versions that your corresponding version of the database supports. |
Known limitationslink
- Fivetran can only connect to DocumentDB primary instances because DocumentDB does not support reading change streams from replica instances. We need change streams to perform incremental updates.
- Fivetran can only connect to DocumentDB using SSH tunneling because of DocumentDB’s security feature limitations.
- Fivetran only syncs DocumentDB documents that are smaller than 16 MB. If you try to sync a document that is 16 MB or larger, we skip syncing that document and notify you with a warning message in your Fivetran dashboard.
- Fivetran does not support syncing non-materialized views.
Featureslink
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
Custom data | check | All collections and fields |
Data blocking | check | Column level and collection level |
Column hashing | check | |
Re-sync | check | Collection level |
History | ||
API configurable | check | |
Priority-first sync | ||
Fivetran data models | ||
Private networking |
Setup guidelink
In your master database, you need to do the following:
- Allow access to your DocumentDB database using Fivetran’s IP.
- Create a Fivetran-specific DocumentDB user with read-level permissions.
- Enable change streams on each collection that you want Fivetran to sync.
- Set the change stream log retention duration so that it can retain at least 48 hours’ worth of changes. We recommend increasing the size to accommodate seven days’ worth of data.
For specific instructions on how to set up your database, please follow our step-by-step DocumentDB setup guide to connect DocumentDB with your destination using Fivetran connectors.
Sync overviewlink
Once Fivetran is connected to your DocumentDB master database, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we use your change streams to pull all your new and changed data at regular intervals.
Replication speedslink
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the data destination.
The ability to sync changes quickly also depends on the sync frequency you configure. The risk of the sync falling behind, or being unable to keep up with data changes, decreases as the sync frequency increases. We recommend a higher sync frequency for data sources with a high rate of data changes.
Schema informationlink
Fivetran tries to replicate the exact schema and collections from your DocumentDB source database to your destination according to our standard database update strategies. For every schema in the DocumentDB database that you connect, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with.
Fivetran-generated columnslink
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source collection._fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformations and mappinglink
As we extract your data, we match DocumentDB data types to types that Fivetran supports. If we don’t support a data type, we automatically change that type to the closest supported type or, in some cases, don’t load that data at all. Our system automatically skips columns with data types that we don’t accept or transform.
The following table illustrates how we transform your DocumentDB data types into Fivetran-supported types:
DocumentDB Command | Fivetran Type | Fivetran Supported |
---|---|---|
Double | BsonType.DOUBLE | True |
String | BsonType.STRING | True |
Object | False | |
Array | False | |
Binary Data | BsonType.BINARY | True |
ObjectId | BsonType.OBJECT_ID | True |
Boolean | BsonType.BOOLEAN | True |
Date | BsonType.DATE_TIME | True |
Null | BsonType.NULL | True |
32-bit Integer (int) | BsonType.INT32 | True |
Timestamp | BsonType.TIMESTAMP | True |
64-bit Integer (long) | BsonType.INT64 | True |
MinKey | False | |
MaxKey | False | |
Decimal128 | False | |
Regular Expression | BsonType.REGULAR_EXPRESSION | True |
JavaScript | False | |
JavaScript (with scope) | False | |
Undefined | False | |
Symbol | False | |
DBPointer | False |
If the first-level field is a simple data type, we map it to its own type. If it’s a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.
For example, the following JSON…
{"street" : "Main St."
"city" : "New York"
"country" : "US"
"phone" : "(555) 123-5555"
"zip code" : 12345
"people" : ["John", "Jane", "Adam"]
"car" : {"make" : "Honda",
"year" : 2014,
"type" : "AWD"}
}
…is converted to the following table when we load it into your destination:
_id | street | city | country | phone | zip code | people | car |
---|---|---|---|---|---|---|---|
1 | Main St. | New York | US | (555) 123-5555 | 12345 | [“John”, “Jane”, “Adam”] | {“make” : “Honda”, “year” : 2014, “type” : “AWD”} |
Excluding source datalink
If you don’t want to sync all your data, you can exclude databases and collections from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
You cannot exclude fields from your syncs.
Initial Synclink
When Fivetran connects to a new DocumentDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find()
operation on each collection. For large collections, we copy a limited amount of data at a time so that we don’t have to start the sync over from the beginning if our connection is lost midway.
Updating datalink
Once the initial sync is complete, Fivetran performs incremental updates of any new or modified data from your source database. We use DocumentDB’s change streams to detect changes to the selected collections.
Fivetran uses DocumentDB’s built-in _id
field as the primary key in the source collections. Using the _id
field to identify rows, we merge changes to your documents into the corresponding tables in your destination:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Deleted datalink
We don’t remove deleted rows from the destination. Instead, we mark rows as deleted by setting the value of their Fivetran-created system column _fivetran_deleted
to TRUE
.
Migrating service providerslink
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won’t retain the same change tracking data as your original DocumentDB database.