Connector Improvement: Ability to skip/ignore delete from the source
PlannedWe have many tables that have a different retention policy between source and destination. Since redshift is a data warehouse and it's a scale out solution we want to store more data compared to the source (SQL Server). That's one of the reason why we pick Redshift to begin with. Unfortunately as of now we have to suck in all the deletes and basically ignore them in a different process.
Would be nice to have the ability to throw away the deletes.
-
Official comment
Hi all,
This feature—skipping source deletes for improved retention management—is currently under development and will be available in the near future.
We will post further updates on this thread as we approach general availability.
Best
-
Hi Yih-Yoon,
Thanks for your reply on our feature request portal. Could you clarify how you would like these deletes to be filtered out?
For example, we offer the fivetran_deleted column with our Soft Delete sync mode today. In your queries, you could ignore all deleted records that are synced into Redshift. Or, do you mean you want the connector to never upsert deletes into your destination? If so, how will you keep track of which records have been deleted or not within your Redshift warehouse?
-
Thank you for the feedback, here are some clarifications:
"In your queries, you could ignore all deleted records that are synced into Redshift": this is repetitive and time-consuming because we have to do it for every table (and we have 10,000's of them).
"Or, do you mean you want the connector to never upsert deletes into your destination? If so, how will you keep track of which records have been deleted or not within your Redshift warehouse?" Yes, this is what we're actually looking for, we want the connector to never upsert deletes into our destination, as we don't need to use 'deleted' records for most business cases. Essentially, we would like to have the option on Fivetran to: either NOT sync 'deleted' rows (where _fivetran_deleted = true) in any table, OR sync them is we choose to. In most business cases, 'deleted' rows are not useful for us to sync to our datawarehouse. Here is the same issue discussed on this link: https://community.looker.com/lookml-5/is-there-a-simple-way-to-always-filter-out-where-fivetran-deleted-true-28743
Thank you for your support. Looking forward to hearing from you. -
We need this as well.
We have a large “events” table in MySQL where periodically old events are deleted, but we want to keep those events in snowflake.
We have “soft delete” enabled, but the act of deleting old records causes us to go way over our MAR limit as those deletes are sync’d.
-
Similar to Joe, we have event tables in SQL Server that are retained for three months. When rows are deleted after the retention period has been exceeded, we are charged MAR to soft-delete the row in Snowflake.
For this use-case, we do not need the record to be soft-deleted in the destination.
This inability to ignore deletes prevents us from utilizing Fivetran due to the high cost.
Thank you!
Jason
Please sign in to leave a comment.
Comments
5 comments