Connector Improvement: Automatically delete absent/outdated rows
AnsweredFor tables that have a schema, it would be useful if there was a way for custom connectors (AWS Lambda, Azure Functions, Google Cloud, etc.) to specify that any rows from that table should be deleted if their primary key does not appear in the latest batch of data.
Example scenario: An API offers an endpoint to get a full list of users (with an ID for each user) but does not offer an endpoint to track user deletions. Historically, this endpoint would return users [1, 2, 3, 4, 5]. Since then, users 2 and 4 have been deleted, so now the API returns users [1, 3, 5].
Current results: Although the API (and therefore, custom cloud connector too) only returns users 1, 3, and 5, the table in our data warehouse will still have 5 users. There isn't a way for the custom cloud connector to know that our warehouse still has users 2 and 4 (in order to send a deletion request).
Desired results: Since Fivetran manages the rows in the warehouse and already knows about users 2 and 4, we should be able to set a flag value to specify that Fivetran should only keep rows in the table if they still appear in the latest set of data. In this case, since Fivetran wouldn't see users 2 and 4 in the result set of [1, 3, 5], it should mark users 2 and 4 as deleted.
TL;DR: a "deleteRowsNotPresentInLatestResponse" setting (but with a better name)
-
Official comment
Hi Ernie,
Thank you very much for taking the time to provide a detailed write up of this request - do you have a particular source you are working with right now that you need this for? How often do you think you need this?
While we will explore our options, we don't have this planned at this time. We will continue to collect customer demand to justify the commitment to support & maintain a high-quality Fivetran connectors. Every upvote on this request increases the case to build it.Best regards,
Alison
-
As a soft delete would be fine as well, sticking with the _fivetran_deleted convention.
-
Hi Alison,
In this case we needed it to bulk import user, group, and membership data from Azure Active Directory (AAD). Since Fivetran doesn't have a connector for AAD, we wrote a custom one using the Microsoft Graph API, but reliably keeping track of deletions was a hurdle.
We have a few other SaaS providers that also don't have a way to track deletions via API, so those custom connectors also have the same issue. For now, we've gotten around this by comparing the fivetran_synced column to the latest value, but it's not always reliable so it'd be nice to get fivetran_deleted functionality for bulk imports.
-
Hi Ernie,
Thank you so much for the additional details it will really help us scope the opportunity.
RegardsAlison
-
Hi! We are having the same problem with Azure Blob Storage.
We use Fivetran to pull data from blobs that are in a Microsoft Azure container.
Every time that we create new blobs the data updates well in our destination table, but when we have to delete one of them, the blobs will no longer appear in the container but they will still be visible in the table.
Please sign in to leave a comment.
Comments
5 comments