Toad World® Forums

I need a function to convert accented characters to non accented characters


#1

We have our own function which does this via a translate and it uses lots of
character strings with the accented characters. This was not a problem in
NON-UTF8 but we are now going to UTF8 and the possible accented characters are
legion.

I tried experiments with CONVERT specifying US7ASCII and that works pretty well.
However it is interesting to see what oracle considers an accent.

For instance the Đ is considered an accented character and converted to a D .
But any character with a tilde ( ~ ) is considered a character and thus not
converted. For instance Ñ is not considered an N and thus is converted to a
question mark.

Does anybody have a nice, small and easy function to remove accents and leave
non accented characters as is.


#2

No solution for UTF8 (someday we may have the same problem as you’re now
facing), but we use this for non-UTF8 to improve search results on a practical
level for users, who might or might not enter accented characters in web form
text fields. Sounds like your earlier non-UTF8 function may have been
similar? Not sure if this will provide any alternative ideas, but thought I’d
share it.

-Ken Ho

FUNCTION our_ascii7(vstring IN VARCHAR2)

  RETURN VARCHAR2

IS

  local_string                  VARCHAR2(4000) :=

vstring;

  /*

For a string, return our tailored version of US 7-bit ASCII for

it from the WE8MSWIN1252 used by the database (WE8ISO8859P1 before

that).

We didn’t want the straight US 7-bit ASCII results, so we use REPLACE and

TRANSLATE for our local tailoring before using CONVERT.

This is a “fuzzy” match intended to improve search results on

a practical level, given that stored data or data that is input for a
search

query cannot be guaranteed to be the same if it is not in USASCII7. This

matching does not pretend to define any “theoretical” correctness.

For example,

without this function appropriately invoked in search routines, an

accented name cannot be found if the non-accented version of it

is searched for (and the reverse situation is also true).

*/

BEGIN

  -- Replace single characters with 2 characters:

  -- German: ß.

  -- Icelandic: Æ æ; Þ and þ ("thorn").

  local_string :=

     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(local_string, 'Æ',

‘AE’),

‘æ’,

‘ae’

                                 ),

                             'ß',

                             'ss'

                            ),

                     'Þ',

                     'TH'

                    ),

             'þ',

             'th'

            );

  -- Replace single characters that aren't converted the way we want

with

  -- the eventual CONVERT call.

  -- Icelandic:  Ð ð ("eth").

  local_string := TRANSLATE(local_string, 'Ã;ÅÐÑÕØãåðñõø',

‘AADNOOaadnoo’);

  -- Use CONVERT with what we have already filtered:

  local_string := CONVERT(local_string, 'US7ASCII', 'WE8MSWIN1252');

  RETURN local_string;

EXCEPTION

  WHEN NO_DATA_FOUND

  THEN

     RETURN NULL;

  WHEN OTHERS

  THEN

     RETURN NULL;

END our_ascii7;
image001.jpeg