Destination Improvement: Improve compatibility with CockroachDB as destination
Hello Fivetran Team,
I hope this email finds you well. I am reaching out to discuss some challenges we encountered during our recent attempts to migrate a large table with approximately 200 million rows from PostgreSQL to CockroachDB. We utilized the Postgres RDS driver for both databases; however, all 15 migration trials failed during the initial sync process. Our investigation revealed that the Fivetran implementation may lead to some challenges when migrating data to CockroachDB.
Based on the logs obtained from CockroachDB, it appears that Fivetran's migration process follows these steps on the destination side:
1) Creates a temporary staging table with `CREATE TEMPORARY TABLE`.
2) COPY data from stdin to the created staging table.
3) Initiates a transaction with `BEGIN TRANSACTION`.
4) DELETE rows from the target table by referencing the staging table.
5) INSERT rows into the target table using the staging table.
6) In case of an error, rollback the transaction, aborts the current sync, and reschedules another initial sync attempt.
We would greatly appreciate it if Fivetran could consider implementing the following improvements to enhance compatibility with CockroachDB:
1) Replacing Steps 4 and 5 (DELETE + INSERT) with an UPSERT statement in CockroachDB(https://www.cockroachlabs.com/docs/stable/upsert.html), which potentially offers better performance.
2) Utilizing a normal table (i.e., CREATE TABLE) instead of a temporary table for the staging tables. It is worth noting that the use of temporary tables in CockroachDB is currently in the preview stage and not recommended for production use. Ideally, we would like to minimize the usage of temporary tables as much as possible.
3) Implementing a retry mechanism for transactions that encounter errors during UPSERT from the staging table to the target table, instead of immediately aborting or rescheduling the current sync attempt.
4) Allowing users to configure the batch size of data to be upserted into the target table within each transaction. Currently, ~50 million rows seem to be written to the target table in each transaction, which means that any error results in a large rollback. We would prefer to apply a smaller batch size to mitigate such issues.
We appreciate your attention to these matters and believe that implementing these improvements will significantly enhance the migration process for CockroachDB users. Thank you for your time and consideration.
Best regards,
Jane Xing
-
Hi Jane,
We haven't explicitly added support for Cockroach DB as a destination, but these are good suggestions when we do. I had some questions about your suggestions:
- Thank you, that makes sense. Do you know what the expected performance difference will be?
- We probably do not want to use a regular table, because it would complicate cleanup for our system. Do you know when the temporary table feature will become stable? This looks like it has been in preview since 2020.
- What sort of errors are you observing? Generally an error in query processing is assumed to be a bug, because transient issues are hopefully not common with a destination database system.
- We optimize batch sizes for performance, and generally assume that database systems rarely have transient failures. I think similar to (3), I'd like to understand what is causing transient failures here.
Thank you!
Eric O'Connor
PM of Platform & Destinations
Please sign in to leave a comment.
Comments
1 comment