Destination Improvement: Improve performance when removing a large existing table from a schema
CompletedRequest: as a user, when I uncheck an entire table from a schema in a postgres connector, do not perform a linear number of operations updating every row in that table as deleted.
Context: We historically chose to sync a rapidly growing Postgres table to our destination, which caused hundreds of millions of rows to be synced over the course of a year. This worked as intended. Later, we decided that the table should not be part of our data warehouse during an audit, and we unchecked that entire table when viewing the schema of our postgres connector. This caused Fivetran to begin sequentially marking rows as "deleted" one at a time for a massive table, which timed out and began causing issues with the connector.
Why this matters: There was no business value to us for the connector to waste time marking every row. We ended up dropping the table in the destination after confirming with Fivetran that that would not impact the rest of our schema. (Also, to be clear, we would not expect Fivetran to automatically drop the table for us just because we unchecked it from the schema, so it's good that that *didn't* happen/)
All the work that was going into attempting to mark all these rows of this massive table (that no longer mattered to our team) was impacting the ability of our connector to keep syncing other tables and rows of data, and it kept timing out and restarting while trying to make updates to this massive set of rows. This caused days of downtime for our connector (and potentially impacted our monthly active rows by considering every "deleted" row as active(? unclear if this is actually the case)).
-
Official comment
Hi Christopher,
Apologies for such a delayed response! I've just tried the described scenario and it looks like it currently behaves as you would expect.
If the problem ever comes up again, please let us know.
Regards,
Val Kulichenko
Please sign in to leave a comment.
Comments
1 comment