Community

Option to Filter Data that is Synced

Planned

Cole User

January 12, 2021 21:24

We would like to request an option that would allow us to filter data that is synced. For example, we have a column named IS_DELETED. When the value for this column is set to TRUE, we do not want this row of data to be synced over to our destination.

Please sign in to leave a comment.

Comments

18 comments

Alexander User
- February 09, 2021 01:46
Hi, product manager here! We are working on a number of options to give you more flexibility in what data you sync. The challenge is balancing that flexibility against 1) complexity and 2) unintended consequences for some customers. I'm curious to learn more - how would filtering IS_DELETED data help your business?
Cole User
- February 09, 2021 02:46
Hi Alexander,

This issue pertains to our Salesforce connector. We basically use the "IS_DELETED" field to determine whether our client is terminated. We keep record of the all of our clients even after they are terminated, so that we will be able to reach out to them again in the future. Therefore, we do not want to remove these clients from Salesforce, rather filter our sync so that our Snowflake data accurately reflects our current active client's data. We have other use cases that we have that would similarly benefit from the ability to filter the data we are syncing to Snowflake. Another connector type that we use is Microsoft SQL Server. Adding in a feature to filter the data syncing over to Snowflake will help us cut down on the steps needed to generate accurate datasets. In addition, it will allow us leverage Fivetran more efficiently and manage how our data is being imported into various Databases in Snowflake. Let me know if there is any other information that would be useful in helping to get this type of feature into Fivetran. Thanks!
Anna User
- March 05, 2021 17:49
Thanks for the context Cole! We don't have this planned at this time. We will continue to collect customer demand to justify the commitment to support & maintain a high-quality Fivetran connector. Every upvote on this request increases the case to build it.
Jonathan Black User
- April 26, 2022 05:49
- Edited
We are looking for this functionality also. There are certain parent tables in SNOW that contain records for all modules, but we only wish to load a subset of these. Copying all records from SNOW and then deleting the records from the other modules in staging is not efficient.
Hannes Kindbom User
- June 22, 2022 09:26
+1
We basically want to apply this where caluse during the fivetran sync in order to avoid outdated data being synced
```
WHERE _fivetran_deleted = false
```
Anton Podviaznikov User
- July 25, 2022 20:26
I'm also interested in this.

Use case: I have huge table and it takes forever to fully sync it. Would be nice to have an ability to only filter records above some date or id.
Josh Chapman User
- September 16, 2022 21:01
Very much would want this feature. There are rows in some of our source systems that provide no value to us in the data warehouse, so we don't want to sync them.
Yaara Zegelman User
- November 15, 2022 08:57
+1

(We're powered by Fivetran)

We have huge tables that 90% of the data is not interesting. For example, we serve clients with 10+ years of data, and the interesting and still updating data is from the last two years. Syncing the entire table causes us damage on both:

1. The long initial sync time, which actually damages our ability to promise the client a 1 day long deployment

2. The heavy load and storage in the destination DB.

I think this is a feature that can be supported at least in the most used connectors.
Corentin Hembise User
- November 28, 2022 11:49
+1 this would be super helpful to avoid sync of unwanted data
Thomas Mann User
- February 08, 2023 14:40
This would be useful for us for all the reasons mentioned by others.
Ivan Shomnikov User
- February 22, 2023 01:54
Syncing GA4 via BigQuery we want to keep all data exported from GA4 in BigQuery but exclude some events to be synced to our Destination DWH via Fivetran.
Brandon Quan User
- April 17, 2023 15:00
100% need this feature as well. Helps with security as well, some records should not be accessed at a certain point, and are just kept for retention policy purposes.
Shay Levy User
- June 08, 2023 02:07
Will be very useful to have this feature. We have some tables with over 1M records where only about 25% of the records have data analytics value.
Oliseamaka Chiedu User
- October 04, 2023 21:47
- Edited
This feature will be very useful in moving huge volumes of data . It would be great to have an option that allows you specify a sql command to filter the records you need by date ,etc. .

This would help reduce the sync time and help users keep up with tight deadlines even in cases of an emergency re-sync .
Vladimir Osin User
- September 27, 2023 13:31
Use case:

multi-tenant database, we want to sync only specific tenants or each tenant to separate destination
Rakshith Churchagundi Amarnath User
- October 04, 2023 20:49
this feature will definitely be useful !
Sales ops User
- January 02, 2024 00:08
I do agree with the fact that row level filtering would be a very useful option!

We have a table of history data which has a lot of unnecessary rows and thus drastically increasing our MAR on certain months. As these are not of an interest to us, there is no point we pull it into our Data warehouse in the first place. For now we do not sync the particular table due to this issue. Being able to filter out the unnecessary rows based on a column filter would resolve this issue as we can extract just what we need and not pay for unnecessary data load.
inaraghj disuja User
- September 11, 2024 05:20
- Edited
Has this functionality been added?

Thanks