Connector Improvement: Automatically delete absent/outdated rows
CompletedFor tables that have a schema, it would be useful if there was a way for custom connectors (AWS Lambda, Azure Functions, Google Cloud, etc.) to specify that any rows from that table should be deleted if their primary key does not appear in the latest batch of data.
Example scenario: An API offers an endpoint to get a full list of users (with an ID for each user) but does not offer an endpoint to track user deletions. Historically, this endpoint would return users [1, 2, 3, 4, 5]. Since then, users 2 and 4 have been deleted, so now the API returns users [1, 3, 5].
Current results: Although the API (and therefore, custom cloud connector too) only returns users 1, 3, and 5, the table in our data warehouse will still have 5 users. There isn't a way for the custom cloud connector to know that our warehouse still has users 2 and 4 (in order to send a deletion request).
Desired results: Since Fivetran manages the rows in the warehouse and already knows about users 2 and 4, we should be able to set a flag value to specify that Fivetran should only keep rows in the table if they still appear in the latest set of data. In this case, since Fivetran wouldn't see users 2 and 4 in the result set of [1, 3, 5], it should mark users 2 and 4 as deleted.
TL;DR: a "deleteRowsNotPresentInLatestResponse" setting (but with a better name)
-
Official comment
I'm pleased to let everyone know that we now support a 'soft delete' mode in our three Function connectors.
You can now include a table in a "soft delete" response node to cause us to add a "_fivetran_deleted" system column and set all existing rows to TRUE at the start of the sync.
See our documentation for detailsAlison
-
As a soft delete would be fine as well, sticking with the _fivetran_deleted convention.
-
Hi Ernie,
Thank you very much for taking the time to provide a detailed write up of this request - do you have a particular source you are working with right now that you need this for? How often do you think you need this?
While we will explore our options, we don't have this planned at this time. We will continue to collect customer demand to justify the commitment to support & maintain a high-quality Fivetran connectors. Every upvote on this request increases the case to build it.Best regards,
Alison
-
Hi Alison,
In this case we needed it to bulk import user, group, and membership data from Azure Active Directory (AAD). Since Fivetran doesn't have a connector for AAD, we wrote a custom one using the Microsoft Graph API, but reliably keeping track of deletions was a hurdle.
We have a few other SaaS providers that also don't have a way to track deletions via API, so those custom connectors also have the same issue. For now, we've gotten around this by comparing the fivetran_synced column to the latest value, but it's not always reliable so it'd be nice to get fivetran_deleted functionality for bulk imports.
-
Hi Ernie,
Thank you so much for the additional details it will really help us scope the opportunity.
RegardsAlison
-
Hi! We are having the same problem with Azure Blob Storage.
We use Fivetran to pull data from blobs that are in a Microsoft Azure container.
Every time that we create new blobs the data updates well in our destination table, but when we have to delete one of them, the blobs will no longer appear in the container but they will still be visible in the table. -
We are pulling sales forecast data from an API via AWS lambda. That API returns a complete current set. Some users have since left the company, and their data is still marked as active (_fivetran_deleted = false), even though they are no longer in the result set. I would have expected the connector to mark missing records as _fivetran_deleted when no longer present, since I did return a schema (indicating primary fields by which to merge), per https://fivetran.com/docs/functions/aws-lambda.
I was told by support that in order to delete these users, I need to figure that out myself and add it to the "delete" key. That would require me to connect to the destination data within the lambda and essentially do the merge myself.
This seems like a bug, not an enhancement request. (support ticket 102532)
Please sign in to leave a comment.
Comments
7 comments