Option to Filter Data that is Synced
Not plannedWe would like to request an option that would allow us to filter data that is synced. For example, we have a column named IS_DELETED. When the value for this column is set to TRUE, we do not want this row of data to be synced over to our destination.
-
Hi, product manager here! We are working on a number of options to give you more flexibility in what data you sync. The challenge is balancing that flexibility against 1) complexity and 2) unintended consequences for some customers. I'm curious to learn more - how would filtering IS_DELETED data help your business?
-
Hi Alexander,
This issue pertains to our Salesforce connector. We basically use the "IS_DELETED" field to determine whether our client is terminated. We keep record of the all of our clients even after they are terminated, so that we will be able to reach out to them again in the future. Therefore, we do not want to remove these clients from Salesforce, rather filter our sync so that our Snowflake data accurately reflects our current active client's data. We have other use cases that we have that would similarly benefit from the ability to filter the data we are syncing to Snowflake. Another connector type that we use is Microsoft SQL Server. Adding in a feature to filter the data syncing over to Snowflake will help us cut down on the steps needed to generate accurate datasets. In addition, it will allow us leverage Fivetran more efficiently and manage how our data is being imported into various Databases in Snowflake. Let me know if there is any other information that would be useful in helping to get this type of feature into Fivetran. Thanks!
-
Thanks for the context Cole! We don't have this planned at this time. We will continue to collect customer demand to justify the commitment to support & maintain a high-quality Fivetran connector. Every upvote on this request increases the case to build it.
-
We are looking for this functionality also. There are certain parent tables in SNOW that contain records for all modules, but we only wish to load a subset of these. Copying all records from SNOW and then deleting the records from the other modules in staging is not efficient.
-
+1
We basically want to apply this where caluse during the fivetran sync in order to avoid outdated data being syncedWHERE _fivetran_deleted = false
-
I'm also interested in this.
Use case: I have huge table and it takes forever to fully sync it. Would be nice to have an ability to only filter records above some date or id. -
Very much would want this feature. There are rows in some of our source systems that provide no value to us in the data warehouse, so we don't want to sync them.
-
+1
(We're powered by Fivetran)
We have huge tables that 90% of the data is not interesting. For example, we serve clients with 10+ years of data, and the interesting and still updating data is from the last two years. Syncing the entire table causes us damage on both:
1. The long initial sync time, which actually damages our ability to promise the client a 1 day long deployment
2. The heavy load and storage in the destination DB.
I think this is a feature that can be supported at least in the most used connectors.
-
+1 this would be super helpful to avoid sync of unwanted data
-
This would be useful for us for all the reasons mentioned by others.
-
Syncing GA4 via BigQuery we want to keep all data exported from GA4 in BigQuery but exclude some events to be synced to our Destination DWH via Fivetran.
-
100% need this feature as well. Helps with security as well, some records should not be accessed at a certain point, and are just kept for retention policy purposes.
-
Will be very useful to have this feature. We have some tables with over 1M records where only about 25% of the records have data analytics value.
-
This feature will be very useful in moving huge volumes of data . It would be great to have an option that allows you specify a sql command to filter the records you need by date ,etc. .
This would help reduce the sync time and help users keep up with tight deadlines even in cases of an emergency re-sync . -
Use case:
multi-tenant database, we want to sync only specific tenants or each tenant to separate destination
-
this feature will definitely be useful !
-
I do agree with the fact that row level filtering would be a very useful option!
We have a table of history data which has a lot of unnecessary rows and thus drastically increasing our MAR on certain months. As these are not of an interest to us, there is no point we pull it into our Data warehouse in the first place. For now we do not sync the particular table due to this issue. Being able to filter out the unnecessary rows based on a column filter would resolve this issue as we can extract just what we need and not pay for unnecessary data load.
-
Has this functionality been added?
Thanks
Please sign in to leave a comment.
Comments
18 comments