Connector Improvement: Additional options for delimited files connectors (for excample, S3 and SFTP)
When uploading files with different connectors (I experienced this with S3 and SFTP connectors), you have 2 possible behaviors when the file does not follow the expected format:
- Fail: if the connector finds a row that doesn't match the expected format, the whole process fails.
- Skip: if the connector finds a row that doesn't match the expected format, then it skips it and continues with the following rows.
Another behavior can be added: to upload the row anyway.
- If the row has more columns than expected, then ignore them.
- If the row has fewer columns than expected, then leave those values in the destination as nulls.
- If there is a type mismatch (for example, the destination column is numeric and the value in the field is a string), then leave the destination field with a null value.
This should all be logged.
Another functionality that can be added to those connectors, besides adding an option, is giving the possibility of deciding which columns to upload as a range with the options:
First column to upload: 2 (this will skip column 1)
Number of columns: 3 (so this will include 3 columns from column 2 -> so 2, 3, 4)
Any additional column is skipped.
-
Official comment
Hi Gabriel,
Thank you for giving us this great feedback. I love the idea of these additional modes however I do wonder about when the cases would help versus hinder. One of the things we are very cautious about is ending up with poor quality data in your warehouse. Wouldn't the case of populating anyway more often than not end up with a problem table?
I would love to hear more about your use cases - where these modes would work well for you - can you share explicit examples.Similarly for the column range case - what data sources would benefit from this range feature and why, where just getting all the data doesn't also cleanly address the need while saving the person setting up the connector from having to configure things and possibly getting it wrong.
Best regardsAlison
Please sign in to leave a comment.
Comments
1 comment