No solution for UTF8 (someday we may have the same problem as you’re now
facing), but we use this for non-UTF8 to improve search results on a practical
level for users, who might or might not enter accented characters in web form
text fields. Sounds like your earlier non-UTF8 function may have been
similar? Not sure if this will provide any alternative ideas, but thought I’d
share it.
-Ken Ho
FUNCTION our_ascii7(vstring IN VARCHAR2)
RETURN VARCHAR2
IS
local_string VARCHAR2(4000) :=
vstring;
/*
For a string, return our tailored version of US 7-bit ASCII for
it from the WE8MSWIN1252 used by the database (WE8ISO8859P1 before
that).
We didn’t want the straight US 7-bit ASCII results, so we use REPLACE and
TRANSLATE for our local tailoring before using CONVERT.
This is a “fuzzy” match intended to improve search results on
a practical level, given that stored data or data that is input for a
search
query cannot be guaranteed to be the same if it is not in USASCII7. This
matching does not pretend to define any “theoretical” correctness.
For example,
without this function appropriately invoked in search routines, an
accented name cannot be found if the non-accented version of it
is searched for (and the reverse situation is also true).
*/
BEGIN
-- Replace single characters with 2 characters:
-- German: ß.
-- Icelandic: Æ æ; Þ and þ ("thorn").
local_string :=
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(local_string, 'Æ',
‘AE’),
‘æ’,
‘ae’
),
'ß',
'ss'
),
'Þ',
'TH'
),
'þ',
'th'
);
-- Replace single characters that aren't converted the way we want
with
-- the eventual CONVERT call.
-- Icelandic: Ð ð ("eth").
local_string := TRANSLATE(local_string, 'Ã;ÅÐÑÕØãåðñõø',
‘AADNOOaadnoo’);
-- Use CONVERT with what we have already filtered:
local_string := CONVERT(local_string, 'US7ASCII', 'WE8MSWIN1252');
RETURN local_string;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
RETURN NULL;
WHEN OTHERS
THEN
RETURN NULL;
END our_ascii7;
