Connector Improvement: S3 JSON files to Snowflake as raw
We use Fivetran for a few of our data loads but we are currently using another tool instead of Fivetran for loading JSON files from S3 into Snowflake. The issue we have with the way that Fivetran loads JSON files from S3 currently is that it's flattening the JSON upon load. This is not the preferred method by many of us that like to leverage the native power in Snowflake for querying unstructured data. We would love to be able to take the JSON and load it into a Variant column in Snowflake as a whole rather than the partial flattening that Fivetran does. To us, the partial flattening defeats the purpose since we are going to have to consume and flatten the other arrays in the JSON anyway. The other and more important benefit to loading it raw is that you get much better data lineage. Meaning, we know exactly what that JSON looked like before any transformations or data manipulation was done just by looking at the data in the loaded table. You don't get this clean data lineage with the current way Fivetran is loading this data since it's doing the partial flattening on load. I could be very wrong but it seems like a very simple addition or change to facilitate the standard Snowflake "COPY INTO" statement using the current connector. My thought was that you could have a check box that when checked, it would load the raw content into a single variant column in a table.
Please let me know if this doesn't make sense. I have lots of examples and even more reasons why we want to load raw CSV or JSON data into Snowflake. Thanks for taking the time to review this request. :)
Please sign in to leave a comment.
Comments
0 comments