Problems with encoding in Database Query

Asked

Viewed 747 times

1

When trying to execute a SELECT, using fields that store data in the JSON structure, I see a problem with the data that may come:

SELECT tbl_pf.nome,
       tbl_pf.adicionais::JSONB->'recebeTaxa' AS localTaxa
FROM   tbl_pf;

When searching for the field with the treatise of bringing only part of the saved JSON, I see this error:

ERROR: Unsupported Unicode exhaust Sequence
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.

When fetching the full JSON field, there is no error.

This search is done in the terminal, where there is already the configuration to handle UTF8 data coming from the database. (This type of query I can use in other tables with data in JSON, with already treated content, including numerical data).

I wanted to know if this problem is only in the terminal, that can not identify the bank’s Internet, or the BD also has "fault in the registry", since using a PHP routine for testing, the error was the same?

1 answer

2


The error message pasted in the question comes from Postgresql, and indicates that the database in question is not in the encoding UTF-8.

To documentation indicates that it is not possible to fully support the JSON specification without the database being encoded in UTF-8, since the RFC 7159 determines which JSON values use this Character set (in practice also supports UTF-16 and UTF-32, but points out that UTF-8 is provides better interoperability between systems).

More specifically for your case, the same page of the documentation clarifies:

However, the input Function for jsonb is stricter: it disallows Unicode escapes for non-ASCII characters (those above U+007F) unless the database encoding is UTF8.

That is, if your database have no encoding UTF8, you will get these conversion errors to "extended" characters every time you extract them from a jsonb field as it is not possible to convert them to charset the relevant bank for data manipulation. I believe that the absence of errors when consulting the field as a whole lies in the fact that it is treated simply as text, with no need to effect a parse to search for certain properties.

Therefore, whether on the command line or any other client used, Postgres should return this same error to the query presented. I suggest you convert your database to UTF8 in order to avoid any coding problems with json and jsonb data types.

If you need your client to "talk" to the bank in another encoding, you can set the parameter client_encoding at the time of connection or in the user-specific settings, such as this another answer here at SOP.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.