Destination Improvement: Redshift Varchar DDL Length for MultiByte Characters
AnsweredAccording to the redshift docs, a column defined with VARCHAR(N) can store up to N bytes of text. In other db platforms (for example in MYSQL), VARCHAR(N) can store up to N characters. This subtle difference leads to incorrect table DDL when using Redshift as a destination and leads to strings that are too long for the destination table when the source database uses character sets that allow more than 1 byte per character.
For example, I set up a connection using MySQL as a data source, and Redshift as the destination, with a text column defined as `varchar(60)` using a `utfmb3` character set (allows up to 3 bytes per character). In this case, fivetran creates that column as `varchar(60)` in Redshift when it really should be creating it as `varchar(180)`. Similarly, a `utfmb4` character set in MySQL should be mapped to a `varchar(240)` column in Redshift.
I haven't tested this on other source DB types but I noticed this when writing custom ETL code and was curious if it was happening in Fivetran as well.
-
Official comment
Hi Freeman,
This is a valuable idea and aligns with the nuances of character storage across different database platforms. I have added your request to our feature improvements backlog for Redshift destinations.
To better understand your needs, could you share more about the specific challenges you’ve encountered due to this DDL mismatch? Are there particular workflows, error cases, or data types most affected by incorrect varchar sizing for multibyte characters?
We will keep the community updated on this thread with any progress.
Best regards,
Please sign in to leave a comment.
Comments
1 comment