Skip to main content

Community

Option to Filter Data that is Synced

Not planned

Please sign in to leave a comment.

Comments

17 comments

  • Alexander User

    Hi, product manager here! We are working on a number of options to give you more flexibility in what data you sync. The challenge is balancing that flexibility against 1) complexity and 2) unintended consequences for some customers. I'm curious to learn more - how would filtering IS_DELETED data help your business?

    Hi Alexander,

    This issue pertains to our Salesforce connector. We basically use the "IS_DELETED" field to determine whether our client is terminated. We keep record of the all of our clients even after they are terminated, so that we will be able to reach out to them again in the future. Therefore, we do not want to remove these clients from Salesforce, rather filter our sync so that our Snowflake data accurately reflects our current active client's data. We have other use cases that we have that would similarly benefit from the ability to filter the data we are syncing to Snowflake. Another connector type that we use is Microsoft SQL Server. Adding in a feature to filter the data syncing over to Snowflake will help us cut down on the steps needed to generate accurate datasets. In addition, it will allow us leverage Fivetran more efficiently and manage how our data is being imported into various Databases in Snowflake. Let me know if there is any other information that would be useful in helping to get this type of feature into Fivetran. Thanks!

  • Anna User

    Thanks for the context Cole! We don't have this planned at this time. We will continue to collect customer demand to justify the commitment to support & maintain a high-quality Fivetran connector. Every upvote on this request increases the case to build it.

    We are looking for this functionality also. There are certain parent tables in SNOW that contain records for all modules, but we only wish to load a subset of these. Copying all records from SNOW and then deleting the records from the other modules in staging is not efficient.

    +1
    We basically want to apply this where caluse during the fivetran sync in order to avoid outdated data being synced

    WHERE _fivetran_deleted = false

    I'm also interested in this.

    Use case: I have huge table and it takes forever to fully sync  it. Would be nice to have an ability to only filter records above some date or id.

    Very much would want this feature. There are rows in some of our source systems that provide no value to us in the data warehouse, so we don't want to sync them.

    +1

    (We're powered by Fivetran)

    We have huge tables that 90% of the data is not interesting. For example, we serve clients with 10+ years of data, and the interesting and still updating data is from the last two years. Syncing the entire table causes us damage on both:

    1. The long initial sync time, which actually damages our ability to promise the client a 1 day long deployment

    2. The heavy load and storage in the destination DB.

    I think this is a feature that can be supported at least in the most used connectors.

    +1 this would be super helpful to avoid sync of unwanted data

    This would be useful for us for all the reasons mentioned by others.

    Syncing GA4 via BigQuery we want to keep all data exported from GA4 in BigQuery but exclude some events to be synced to our Destination DWH via Fivetran.   

    100% need this feature as well. Helps with security as well, some records should not be accessed at a certain point, and are just kept for retention policy purposes.

    Will be very useful to have this feature. We have some tables with over 1M records where only about 25% of the records have data analytics value.

    This feature will be very useful in moving huge volumes of data . It would be great to have an option that allows you specify a sql command to filter  the records you need by date ,etc. .

    This would help reduce the sync time and help users keep up with tight deadlines even in cases of an emergency re-sync .



    Use case:

    multi-tenant database, we want to sync only specific tenants or each tenant to separate destination

    this feature will definitely be useful !

    I do agree with the fact that row level filtering would be a very useful option!

    We have a table of history data which has a lot of unnecessary rows and thus drastically increasing our MAR on certain months. As these are not of an interest to us, there is no point we pull it into our Data warehouse in the first place. For now we do not sync the particular table due to this issue. Being able to filter out the unnecessary rows based on a column filter would resolve this issue as we can extract just what we need and not pay for unnecessary data load.