Diacritic removal

9 Feb 2018

      Just wondering, before I do it myself, whether "normalizing" strings by
removing diacritics is a feature that you guys would consider interesting
to implement.
(I am aware that results of this "normalization" can be unsatisfactory in
some cases - I'm ok with most obvious cases covered)

Basically, something like what iconv does here:

$ echo "Ã é ç" | iconv -f UTF-8 -t ASCII//TRANSLIT
A e c

I can imagine that these above would be pretty easy to solve using the same
hash lookup approach used for case conversion.

These, however, are example that produce more than one symbol in output. So
that would be a little more complicated.

$ echo "æ ß" | iconv -f UTF-8 -t ASCII//TRANSLIT
ae ss

So I have two questions:
1) Is this something you guys (MonetDB developers) would be interested in
implementing?
2) If not - in that case I'll give it a go myself, would you advise an
approach similar to the case conversion or to use the iconv lib?

Thanks for your input.
Rboerto

Diacritic removal

Roberto Cornacchia