Community

Proper identifier for special characters when creating columns

Answered

Russell Jhun Bucsit User

October 14, 2020 08:09

Hello,

In the source file, there are two column headers which are named "Service Level (aban +ve)" and "Service Level (aban -ve)". When fivetran created these columns in a redshift table only one column was created, "service_level_aban_ve_".

We would like to request a feature that instead of automatically replacing all special characters with underscore (_) when creating redshift columns, fivetran should be able to detect some characters and replace them with a proper identifier. For example, % should be replaced by "pct", - should be replaced by "minus", + should be replaced by "plus" or # should be replaced by "num".

For detailed use case, please see attached two images that show a) the original columns in the source file and b) the created column in the redshift table.

There was already a request for this feature (https://support.fivetran.com/hc/en-us/requests/11250) which was closed the first time, and followed up 2 times ( https://support.fivetran.com/hc/en-us/requests/11913, https://support.fivetran.com/hc/en-us/requests/13194) with the latest reply recommending us to request directly here at the feature request portal.

Please sign in to leave a comment.

Comments

2 comments

Official comment

Kevin Kim Fivetranner
- October 27, 2020 22:12
Hi Russell,

Kevin from the Product team here - apologies for the delay! The team has been discussing potential solutions for this problem.

In order to stay in line with our core product principles, it is crucial that the solution to this situation is simple, predictable, and default for all connectors. For this particular case of special characters, we discussed several potential solutions:

1. Don't rename symbols, and keep column names the same when writing to destinations: This solution is not possible since many warehouses support different characters. For example, while Snowflake supports characters like '+' and '-', Azure Synapse does not. We have renaming conventions because the destinations we partner with all have different character set requirements.

2. Rename symbols to a set of readable strings (e.g. '+' resolves to 'plus'): This solution seemed promising at first, but there are many characters where we don't have context on what the symbol should mean. For example, 'dollars lost (-)' might intuitively become 'dollars lost (minus)', but what about a column such as 'long-term data' which should be 'long hyphen term data'. There are some columns where we don't know what the symbol should become, and might cause confusion when analysts see these names in their warehouse.

3. Whenever we detect a duplicate, append some number to the column name: An example of this would be when you create duplicate files on your computer. You might see a syntax like 'myColumn' and 'myColumn (1)'. This solution does not guarantee preventing duplicates for all custom column types, and we felt does that fix the core of the issue, which is having different columns that are resolving to the same name.

Each of these proposals creates issues/problems because the solutions do not clearly solve the underlying issue. Would you have any feedback on the discussion points above?

Also, would it be possible for you to run a script prior to Fivetran syncs to change the names of the columns to prevent the name duplications that occur?
Russell Jhun Bucsit User
- October 26, 2020 07:41
Hello support team, following up on this please. Thank you