Connector Improvement: DynamoDB connector repeatedly polls empty stream, slowing down sync
We are setting up a DynamoDB connector and were planning to connect it to around ~150 tables by the end of our implementation. During our testing (pre-purchase) we had tested against 1 table with a high volume of updates (around 10,000 an hour which is what we expect in aggregate for all 150 tables). For this test scenario, sync jobs were executing very quickly, around 2 minutes. We were very pleased with this performance.
Fast forward a couple of months, and we are starting to connect multiple DDB tables to Fivetran. These are very low volume tables (less than 5 updates per hour). However, we have found that the sync time seems to scale linearly with the number of connected tables, not the volume of updates. For 8 connected tables, our sync job is now taking 9 minutes. Logically following, it seems like connecting 150 tables may cause the sync job to take 2+ hours. We'd like to run the job every 5 to 15 minutes.
From the logs and from Fivetran Support, I understand the DDB shards are being checked 10 times per table, per sync job, synchronously. So even if there are no updates for a given table, that table is being "extracted" for upwards of 1 minute. Can this behavior be adjusted at all?
From what I can tell, again from Fivetran logs and Snowflake query history, the actual processing and loading steps are very fast (even for high volume of updates). It seems to be the extraction and retries that are causing the bottleneck when multiple tables are added.
For reference, here is my support ticket where they suggested I submit this as a feature request: https://support.fivetran.com/hc/en-us/requests/55937
Thank you!
-
Would also like to see this feature!
Please sign in to leave a comment.
Comments
1 comment