Amazon DynamoDB is a fully-managed, proprietary NoSQL database service that is offered as part of Amazon Web Services (AWS).
Featureslink
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | All tables and fields |
Custom data | check | All tables and fields |
Data blocking | check | Column level, table level, and schema level |
Column hashing | check | |
Re-sync | check | Table level |
History | check | |
API configurable | check | |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | AWS PrivateLink (DynamoDB on EC2 only) |
Supported configurationslink
Fivetran supports the following DynamoDB configurations:
Supportability Category | Supported Values |
---|---|
Maximum throughput * | 3.0 MBps |
Connector limit per database | No limit |
* Maximum throughput is your connector’s end-to-end update speed, measured in megabytes per second (MBps). We calculate the maximum throughput by averaging the number of rows synced per second during your connector’s last 3-4 syncs. To learn more about sync speed, see the Replication speeds section.
Setup guidelink
Follow our step-by-step DynamoDB setup guide to connect DynamoDB with your destination using Fivetran connectors.
Sync Overviewlink
Once Fivetran is connected to your database, we pull a full dump of all selected data from your database. We then use DynamoDB Streams to pull all your new and changed data at regular intervals. If data in your master database changes (for example, you add new tables or change a data type), Fivetran automatically detects and persists these changes into your destination.
Sync mode optionslink
Sync modes determine the form in which Fivetran delivers your data. You must choose a sync mode for each table you want Fivetran to sync. There are two sync modes - packed and unpacked. If your table has or will have more than 1000 unique first-level keys, you must use packed mode.
Unpacked mode
If your table has fewer than 1000 first-level attributes, you have the option to have your data delivered unpacked. Fivetran unpacks one layer of nested fields and infer types.
In unpacked mode, the following source table
{
"foo": 1, <== partion key and/or sort key
"bar": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
foo INTEGER | bar INTEGER | nested JSON |
---|---|---|
1 | 2 | {"baz":3} |
Packed mode
All tables with more than 1000 first-level keys must have their data delivered packed, though you can select packed mode for smaller tables too.
In packed mode, the following source table
{
"foo": 1, <== partion key and/or sort key
"bar": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
foo INTEGER | data JSON |
---|---|
1 | {"foo":1, "bar":2, nested":{"baz":3}} |
Switching sync modes
You can switch sync modes for a table at any time in your Fivetran dashboard. However, you must do a full re-sync for that table when you change sync modes. For example, you may have a table that originally had 700 first-level keys but now has 1100 first-level keys. You would need to change the sync mode for that table to packed and do a full-table re-sync.
To change the sync mode for a table, do the following:
- In the connector dashboard, go to the Setup tab.
- Click Edit connection details.
- In the connector setup form, change the Sync Mode. If you selected packed mode, then select the table(s) that you want to sync. Any tables that you do not select will be synced in standard mode.
- Click Save & Test.
- Once the setup tests pass, initiate a full re-sync for the table(s) whose sync mode you changed.
Replication speedslink
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the data destination.
The ability to sync changes quickly also depends on the sync frequency you configure. The risk of the sync falling behind, or being unable to keep up with data changes, decreases as the sync frequency increases. We recommend a higher sync frequency for data sources with a high rate of data changes.
Schema informationlink
Fivetran tries to replicate the exact schema and tables from your DynamoDB source database to your destination.
Fivetran-generated columnslink
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source database._fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformations and mappinglink
As we extract your data, we match DynamoDB data types to types that Fivetran supports. If we don’t support a data type, we automatically change that type to the closest supported type or, in some cases, don’t load that data at all. Our system automatically skips columns with data types that we don’t accept or transform.
The following table illustrates how we transform your DynamoDB data types into Fivetran supported types:
DynamoDB Type | Fivetran Type | Fivetran Supported |
---|---|---|
STRING | STRING | True |
BINARY | BINARY | True |
NUMBER | DECIMAL | True |
STRINGSET | JSON | True |
NUMBERSET | JSON | True |
BINARYSET | JSON | True |
MAP | JSON | True |
LIST | JSON | True |
BOOLEAN | BOOLEAN | True |
Excluding source datalink
If you don’t want to sync all the data from your master database, you can exclude schemas or tables from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Column Blocking documentation.
Alternatively, you can change the permissions of the Fivetran-specific IAM role you created and restrict its access to certain tables.
Initial synclink
When Fivetran connects to a new DynamoDB database, we scan through each of your selected tables one at a time to fetch your data. If we encounter a ProvisionedThroughputExceededException
error message, we retry the sync with an exponential backoff strategy. We recommend that you have a high provisioned throughput read capacity for your tables so that Fivetran doesn’t encounter this error.
Updating datalink
Once the initial sync is complete, Fivetran performs incremental updates of any new or modified data from your source database. We use the DynamoDB Streams Kinesis Adapter to process the change data in your DynamoDB Streams and fetch only the data that has changed since our last sync.
We merge changes to your tables into the corresponding tables in your destination:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
DynamoDB Streams data retention limitlink
DynamoDB Streams have a retention period of 24 hours. Change records exist in the stream for one day and are then deleted. If your syncs fail for more than 24 hours, we issue a warning that you must do a full re-sync to make sure we capture the data changes we missed.
Deleted rowslink
We do not delete rows from your destination. When a row is deleted from the source table, we set the _fivetran_deleted
column value of the corresponding row in the destination to TRUE
.