Content of XML column shown incorrectly

Hello,

I'm on Support Bundle for Toad for Oracle 12.8.0.20

While testing an application involving XML messages I've found that a stored UTF-8 character ä (x'c3a4') is shown as two garbage characters (UTF-16 x'00c300a4')

To check where the issue arises, I've isolated one sample attribute containing the error and displayed it as xml and as varchar2. See attachment 1, shown very small below this line (I tried to enlarge but failed, sorry)

As you can see (hopefully) the right-hand result column continaing varchar2 data shows correctly Hochhäuser (please mind the a-umlaut ä character)
In the left-hand column the xml is shown "as is", but the ä is getting mangled into two garbage characters.

To figure out what happened I checked the hex contents of both fields to compare. Sure enough in the left-hand column we find a code sequence of x'00c300a4' (corrected for endian-ness) at offset x'6a':

while the right-hand column shows X'00E4' at offset x'6a'

To further corroborate I also dumped the input field that was used to generate the XML data:

It takes close scrutiny, but you'll find that leading 'Hochh' all take just 1 byte, just like trailing 'user', but the intervening ä is encoded as 195, 164, which is x'c3a4' and that is indeed the correct unicode UTF-8 representation for the ä character.

The difference between the two versions of the xml field now is partly explicable:
the correct version shows X'e4' translated from the orginal x'c3a4', whereas the incorrect version shows x'00c00c4' where apparently each by te has been prefixed with x'00' (in an attempt to translate into utf-16?)

Best regards,
Abe Kornelis

All,

I know codepage issues are complex. Is there anyone out there able to help me with this issue, please? The incorrectly rendered characters are giving me a pain in the err… head, actually.

Thanks in advance,

Abe

===

I had a similar problem but we involved using VARCHAR2 columns instead of XMLType. We get data through MQSeries and we’ve had conversion problem.

What’s your exact DB charset and what is your client NLS setting?

Our databases are all Unicode, UTF-8 encoded.

According to Toad Support Bundle my NLS is DUTCH_THE NETHERLANDS.WE8MSWIN1252

Does this relate in any way with your experience?

Abe

===

My DB was EE8ISO8859P2 but we got an UTF8 character. Toad wasn’t able to display the character with VARCHAR2 defined column.

Ok, that seems to be a different issue.

The original data resides in a varchar2 column and is displayed correctly by Toad.
When the exact same data is embedded in an xmltype, the data are shown incorrectly,
even though the data seems to have been stored correctly.

Abe

John,

thanks for your reply. If the error is indeed in the interface supplid by Oracle, then we’re toast. I tried your workaround, but that fails because Oracle 9 does not know about the GetClobVal method, or so it seems. Bad luck this has to happen on our outdated 9 system :frowning:

Anyway, seeing that there is nothin much we can do about it, I guess we’d better close this case.

Kind regards,
Abe

For me, GetClobVal() works on a 9.2.0.1 database with a 9.2.0.1 client.

A table alias must be used with GetClobVal(), so if you didn’t have a table alias, add it.

SELECT XML_COL.GetClobVal() as XML_COL – fails

FROM HAS_XML;

SELECT H.XML_COL.GetClobVal() as XML_COL – works

FROM HAS_XML H;

John,

thanks; I just learned something new, which is great!
And: I can now see the correct tekst data indeed.

I should have realized the interpreter needs an additional level of qualification
to be able to recognize that is’s not just table.column…

Anyway, xmltype.getclobval(table.column) works fine as well.
No doubt you already knew that.

Kind regards,
Abe