Problem Summary: How to Workaround a "F_JG2901_ String containing invalid sequence encountered while encoding from UTF-8-BMP to UTF-8-BMP" error message.
A customer reported receiving the following error:
2019-02-20T01:24:44-05:00: stageusmig-refrchar-ostg-gcstg: F_JG2901: String ‘Merry Christmas, Jo! Love You! xf0x9fx98x98xf0x9fx8ex85xf0x9fx8fxbbxf0x9fx8ex84’ containing invalid sequence encountered while encoding from UTF-8-BMP to UTF-8-BMP for table ‘<table_name>’ column ‘<column_name>’.
F_JT1458: The previous error occurred while Project pipe was processing row 299089 of table ‘<table_name>’ (row 299089 for Project pipe).
In this case their
- Hub is an Oracle 126.96.36.199 database running on Linux
- Source database is an Oracle 188.8.131.52 database also running on Linux
- Target is a Cloud SQL 5.7 database running in an GCP instance
HVR Support looked into the log file and found there are emoticons embedded in the data.
For example, xf0x9fx98x98 is the UTF-8 byte sequence for Unicode U+1F618, perhaps better known as 😘
We recommended adding the Actions TableProperties /CoerceErrorPolicy=WARNING and TableProperties /CoerceErrorType=ENCODING to the channel followed by regenerating the refresh job and either running the refresh as an on-demand job or schedule it to run.
We also informed the customer that U+1F618 is not within the range of the UTF-8 BMP (Basic Multilingual Plane) that the source Oracle DBMS is using. However, it is part of the UTF-8 SMP (Supplementary Multilingual Plane), so the channel actions we introduced today would not be necessary if the customer were the use the UTF-8 character set for your Oracle source database.