Community

Connector Improvement: Automatically delete absent/outdated rows

Completed

Ernie Farias User

March 14, 2022 18:50
Edited

For tables that have a schema, it would be useful if there was a way for custom connectors (AWS Lambda, Azure Functions, Google Cloud, etc.) to specify that any rows from that table should be deleted if their primary key does not appear in the latest batch of data.

Example scenario: An API offers an endpoint to get a full list of users (with an ID for each user) but does not offer an endpoint to track user deletions. Historically, this endpoint would return users [1, 2, 3, 4, 5]. Since then, users 2 and 4 have been deleted, so now the API returns users [1, 3, 5].

Current results: Although the API (and therefore, custom cloud connector too) only returns users 1, 3, and 5, the table in our data warehouse will still have 5 users. There isn't a way for the custom cloud connector to know that our warehouse still has users 2 and 4 (in order to send a deletion request).

Desired results: Since Fivetran manages the rows in the warehouse and already knows about users 2 and 4, we should be able to set a flag value to specify that Fivetran should only keep rows in the table if they still appear in the latest set of data. In this case, since Fivetran wouldn't see users 2 and 4 in the result set of [1, 3, 5], it should mark users 2 and 4 as deleted.

TL;DR: a "deleteRowsNotPresentInLatestResponse" setting (but with a better name)

Please sign in to leave a comment.

Comments

13 comments

Official comment

Alison Fivetranner
- August 01, 2023 22:57
- Edited
I'm pleased to let everyone know that we now support a 'soft delete' mode in our three Function connectors.

You can now include a table in a "soft delete" response node to cause us to add a "_fivetran_deleted" system column and set all existing rows to TRUE at the start of the sync.

See our documentation for details

Alison
Jared Duquette User
- March 14, 2022 18:55
As a soft delete would be fine as well, sticking with the _fivetran_deleted convention.
Alison Fivetranner
- August 01, 2023 22:57
- Edited
Hi Ernie,

Thank you very much for taking the time to provide a detailed write up of this request - do you have a particular source you are working with right now that you need this for? How often do you think you need this?

While we will explore our options, we don't have this planned at this time. We will continue to collect customer demand to justify the commitment to support & maintain a high-quality Fivetran connectors. Every upvote on this request increases the case to build it.

Best regards,

Alison
Ernie Farias User
- May 04, 2022 17:12
- Edited
Hi Alison,

In this case we needed it to bulk import user, group, and membership data from Azure Active Directory (AAD). Since Fivetran doesn't have a connector for AAD, we wrote a custom one using the Microsoft Graph API, but reliably keeping track of deletions was a hurdle.

We have a few other SaaS providers that also don't have a way to track deletions via API, so those custom connectors also have the same issue. For now, we've gotten around this by comparing the fivetran_synced column to the latest value, but it's not always reliable so it'd be nice to get fivetran_deleted functionality for bulk imports.
Alison Fivetranner
- May 06, 2022 02:20
Hi Ernie,

Thank you so much for the additional details it will really help us scope the opportunity.

Regards

Alison
Gaston Uriarte User
- June 10, 2022 14:00
Hi! We are having the same problem with Azure Blob Storage.

We use Fivetran to pull data from blobs that are in a Microsoft Azure container.
Every time that we create new blobs the data updates well in our destination table, but when we have to delete one of them, the blobs will no longer appear in the container but they will still be visible in the table.
Judy Campion User
- September 15, 2022 14:49
We are pulling sales forecast data from an API via AWS lambda. That API returns a complete current set. Some users have since left the company, and their data is still marked as active (_fivetran_deleted = false), even though they are no longer in the result set. I would have expected the connector to mark missing records as _fivetran_deleted when no longer present, since I did return a schema (indicating primary fields by which to merge), per https://fivetran.com/docs/functions/aws-lambda.

I was told by support that in order to delete these users, I need to figure that out myself and add it to the "delete" key. That would require me to connect to the destination data within the lambda and essentially do the merge myself.

This seems like a bug, not an enhancement request. (support ticket 102532)
Justin Canney User
- October 09, 2023 14:06
Does anyone have any recommendations for how to handle this currently? I am in the same situation where the API is giving me a list of users, but does not give me an endpoint to see deleted users. Therefore they are just falling out of the response list, meanwhile in my warehouse the deleted users appear as if they are still active.

Would almost be better if Fivetran offered a kill & fill option where the data gets dropped each time the connector syncs. That would allow my warehouse to accurately reflect the data from the API.
Alison Fivetranner
- October 09, 2023 23:24
Hi Justin,

I'm sorry you are experiencing difficulties.

Have you been able to explore the SoftDelete option? I think it might provide the 'kill and fill' experience you were suggesting.

Best regards

Alison
Justin Canney User
- October 10, 2023 12:15
I have tested the softDelete feature but it is not working as I would expect.
I took the existing connector and modified the AWS Lambda to include the new softDelete field as follows:

After making the changes I synced the connector and then also did a full historical resync. I checked the snowflake table after each one and the _fivetran_deleted field has no been added to my table. Is there anything additional that needs to be done in order for softDelete to work?

Please note that in my snowflake table I have records that are "deleted" (meaning they are no longer in the current payload) and their _FIVETRAN_SYNCED field is from 2+ months ago. I would expect that these records would now show up as _fivetran_deleted = True.
Alison Fivetranner
- October 10, 2023 17:15
Hi Justin,

That sound fishy! Can you please create a support ticket so we can troubleshoot to identify and fix any issues that we find.

I'm looking forward to getting this up and running for you.

Alison
Justin Canney User
- October 11, 2023 13:25
For anyone who finds this in the future, this may be useful. My connector was using S3 to transfer the data from the lambda function to fivetran. Initially I added the `softDelete` field only to the S3 file.
Turns out that the `softDelete` field needs to be included in the lambda response payload as well.
Alison Fivetranner
- October 11, 2023 18:04
Hi Justin,

Thank you so much for following up and I'm glad it was a quick fix.
Alison