MongoDB is a NoSQL database characterized by a lack of fixed columns and fixed tables. Instead, it has collections (which are similar to tables) and dynamic schemas. MongoDB is a document-oriented database that uses JSON documents.
Supported serviceslink
Fivetran supports two different MongoDB configurations:
Supported configurationslink
Fivetran supports the following MongoDB configurations:
Supportability Category | Supported Values | Notes |
---|---|---|
MongoDB database versions | 2.6+ | |
MongoDB Atlas database versions | 4.0+ | |
Node types | Electable, Read-Only, and Analytics | Configure read preference and tags based on the node type. |
Maximum throughput * | 5.0 MBps | |
Connector limit per database | 3 |
* Maximum throughput is your connector’s end-to-end update speed, measured in megabytes per second (MBps). We calculate the maximum throughput by averaging the number of rows synced per second during your connector’s last 3-4 syncs. To learn more about sync speed, see the Replication speeds section.
Network protocol | Supported Versions | Notes |
---|---|---|
Transport Layer Security (TLS) | TLS 1.0, TLS 1.1, TLS 1.2 | We can only support TLS versions that your corresponding version of the database supports. |
Known limitationslink
- Fivetran does not support Mongo tiers M0, M2, or M5 because MongoDB’s smaller managed tiers do not provide oplogs. We need oplogs to perform incremental updates.
- Fivetran does not support syncing multi-document transactions for connectors using oplogs for incremental updates.
- Fivetran does not support syncing non-materialized views.
Featureslink
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
Custom data | check | All tables and fields |
Data blocking | check | Column level and table level |
Column hashing | check | |
Re-sync | check | Table level |
History | check | |
API configurable | check | |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | AWS PrivateLink (MongoDB on EC2 only) |
Setup guidelink
For specific instructions on how to set up your database, see the guide for your MongoDB configuration:
Sync overviewlink
Once Fivetran is connected to your MongoDB master database or read replica, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we use your oplogs or change streams to pull all your new and changed data at regular intervals.
Pack modelink
Pack modes determine the form in which Fivetran delivers your data. You must choose a pack mode for each table you want Fivetran to sync. There are two pack modes - packed and unpacked.
Unpacked mode
By default, Fivetran unpacks one layer of nested fields and infer types.
In unpacked mode, the following source table
{
"_id": 1, <== key
"foo": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_id INTEGER | foo INTEGER | nested JSON |
---|---|---|
1 | 2 | {"baz":3} |
Packed mode
In packed mode, the following source table
{
"_id": 1, <== key
"foo": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_id INTEGER | data JSON |
---|---|
1 | {"_id":1, "foo":2, nested":{"baz":3}} |
Switching pack modes
You can switch pack modes for a table at any time in your Fivetran dashboard. When you change the pack mode for a table, we automatically perform a full table re-sync.
To change the pack mode for a table, do the following:
- In the connector dashboard, go to the Setup tab.
- Click Edit connection details.
- In the connector setup form, change the Pack Mode. If you selected packed mode, then select the table(s) that you want to sync in packed mode. Any tables that you do not select will be synced in default mode.
- Click Save & Test.
Replication speedslink
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the data destination.
The ability to sync changes quickly also depends on the sync frequency you configure. The risk of the sync falling behind, or being unable to keep up with data changes, decreases as the sync frequency increases. We recommend a higher sync frequency for data sources with a high rate of data changes.
Schema informationlink
Fivetran tries to replicate the exact schema from your MongoDB source database to your destination.
When you connect to Fivetran and specify a source database, you also select a schema prefix. We map the schemas we discover in your source database to your destination and prepend the destination schema names with the prefix you selected.
Fivetran-generated columnslink
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source collection._fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformations and mappinglink
As we extract your data, we match MongoDB data types to types that Fivetran supports. If we don’t support a certain data type, we automatically match that data type to a regular Java type.
The following table illustrates how we transform your MongoDB data types into Fivetran-supported types:
MongoDB Type | Fivetran Type | Fivetran Supported | Notes |
---|---|---|---|
ARRAYLIST | JSON | True | Each element of array is recursively transformed based on type |
BINARY | JSON | True | |
BSON_ARRAY | JSON | True | Each element of array is recursively transformed based on type |
BSON_BINARY | JSON | True | |
BSON_BOOLEAN | BOOLEAN | True | |
BSON_DATETIME | INSTANT | True | INSTANT created using data-time milliseconds from EPOCH |
BSON_DB_POINTER | STRING | True | BsonDBPointer is transformed to its ID(String) |
BSON_DECIMAL_128 | BIGDECIMAL | True | |
BSON_DOUBLE | DOUBLE | True | |
BSON_INT_32 | INTEGER | True | |
BSON_INT_64 | LONG | True | |
BSON_NULL | NULL | True | |
BSON_OBJECT_ID | STRING | True | ObjectId value in String format |
BSON_REGULAR_EXPRESSION | STRING | True | RegEx pattern(String) of regular expression is used for transformation |
BSON_STRING | STRING | True | |
BSON_SYMBOL | STRING | True | |
BSON_TIMESTAMP | LONG | True | Timestamp is transformed to number of milliseconds from EPOCH |
BSON_UNDEFINED | False | ||
CODE | STRING | True | |
DATE | INSTANT | True | |
DECIMAL128 | BIGDECIMAL | True | If the decimal is NaN or Infinite, it is transformed to DECIMAL format |
MAX_KEY | False | ||
MIN_KEY | False | ||
OBJECT_ID | STRING | True | |
SYMBOL | STRING | True | |
UUID | STRING | True |
If we are missing an important data type that you need, please reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual data destination pages.
Mapping
We map all first-level fields of your documents to columns in your destination. If the first-level field is a simple data type, we map it to its own type. If it’s a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.
For example, the following JSON…
{"street" : "Main St."
"city" : "New York"
"country" : "US"
"phone" : "(555) 123-5555"
"zip code" : 12345
"people" : ["John", "Jane", "Adam"]
"car" : {"make" : "Honda",
"year" : 2014,
"type" : "AWD"}
}
…is converted to the following table when we load it into your destination:
_id | street | city | country | phone | zip code | people | car |
---|---|---|---|---|---|---|---|
1 | Main St. | New York | US | (555) 123-5555 | 12345 | [“John”, “Jane”, “Adam”] | {“make” : “Honda”, “year” : 2014, “type” : “AWD”} |
Excluding source datalink
If you don’t want to sync all your data, you can exclude databases and collections from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
You cannot exclude fields from your syncs.
Initial Synclink
When Fivetran connects to a new MongoDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find()
operation on each collection. For large collections, we copy a limited amount of data at a time so that we don’t have to start the sync over from the beginning if our connection is lost midway.
Updating datalink
Once the initial sync is complete, Fivetran performs incremental updates of any new or modified data from your source database.
We use one of the following incremental sync methods to perform incremental updates:
Fivetran uses MongoDB’s built-in _id
field as the primary key in the source tables. Using the _id
field to identify rows, we merge changes to your documents into the corresponding tables in your destination:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Oplogslink
Oplog (operations log) is a special capped collection that keeps a rolling record of all the operations that modify the data stored in the MongoDB databases. We use oplogs to detect the changes to the selected collections.
Change Streams Betalink
Change streams allow applications to access the real-time data changes without the complexity and risk of tailing the oplog. Change streams support syncing multi-document transactions. We open a change stream against each selected collection.
To use change streams, you must use the following on your replica sets or sharded clusters:
-
MongoDB version 4.0+
-
NOTE: You can also use change streams on deployments that employ MongoDB’s encryption at rest feature.
By default, we use oplogs for incremental updates. If you have the prerequisites to use change streams, contact support to enable change streams.
Incremental sync cursor expirylink
Oplog and change stream cursors expire when the time between two successive syncs leads to a loss of change data in the source database. Cursors may expire because of the connector’s sync frequency, or the size of the oplog doesn’t accommodate the change data. When cursors expire, we reschedule the connector’s sync and trigger automatic re-syncs:
- A full source re-sync when oplog cursors expire.
- A full table re-sync when change streams cursors for the table expire.
Deleted datalink
We don’t remove deleted rows from the destination. Instead, we mark rows as deleted by setting the value of their Fivetran-created system column _fivetran_deleted
to TRUE
.
Migrating service providerslink
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won’t retain the same change tracking data as your original MongoDB database.