Finding high-ascii in PL/SQL?

We have a few misguided developers who cut/paste from Microsoft documents into inline comments in PL/SQL, resulting in smart-quotes the fat dash and other high-ascii in these files. Of course if these are in comments, the code deploys without error and we have the upside-down question marks when viewing the database code.

Can Toad help us identify and perhaps prevent high-ascii characters from being deployed?

Regards,
Doug

Hi Doug,

I did a quick check and AFAIK, TOAD doesn't do this. You might be able to do it in PL/SQL but it's very awkward.

You can ask any AI to do this and they'll give you code in Python, Java, PowerShell, bash, or any other scripting language you choose.

Here's some Python code from MS365 Copilot.

HTH,
Andy

Define replacements for common high ASCII characters

replacements = {
'“': '"', '”': '"', # smart double quotes
'‘': "'", '’': "'", # smart single quotes
'–': '-', '—': '-', # en dash and em dash
'…': '...', # ellipsis
'•': '*', # bullet
' ': ' ', # non-breaking space
}

def clean_text(text):
# Replace known high ASCII characters
for char, replacement in replacements.items():
text = text.replace(char, replacement)
# Remove any remaining characters with ASCII > 127
text = re.sub(r'[^\x00-\x7F]+', '', text)
return text

Example usage

input_file = 'input.txt'
output_file = 'cleaned_output.txt'

with open(input_file, 'r', encoding='utf-8') as f:
content = f.read()

cleaned_content = clean_text(content)

with open(output_file, 'w', encoding='ascii', errors='ignore') as f:
f.write(cleaned_content)

print(f"Cleaned text saved to {output_file}")

Hi Andy,

That's a pretty cool solution to the problem, much more flexible than Notepad++ built-in high ascii search functionality that I have been recommending.

Search > Find characters in range...

Doug

@Andy_Young

To show a hashmark in your text without bolding the whole line, put a backslash before it.

So, this: \# will come out as: #

Toad's Find/Replace dialog in the Editor uses PCRE regex. This regular expression matches most common, odd chars found in Microsoft products. It may need extended. I used ChatGPT to give me this expression and in a quick test it's matching what I expect it to.

[\x{2018}-\x{201F}\x{2026}\x{00A0}]

Very cool. Thanks.