String Functions

Character string operators

These apply to MonetDB SQL character string types: character, character varying, varchar, clob and subtypes: json and url.

In MonetDB all character strings are UTF-8 encoded.

OperatorDescriptionExampleResult
||concatenate two strings. Returns null when one of the strings is null'Monet' || 'DB'MonetDB
||concatenation of a string with a non-string input'Value: ' || 42Value: 42

Character string functions

FunctionReturn typeDescriptionExampleResult
ascii(s)intUnicode code point of the first character of the string sascii(R'€')8364
asciify(s)same as inputReplace all non-ASCII characters in string s with ASCII equivalentsasciify(R'a € sign')a EUR sign
char_length(s)intnumber of UTF-8 characters in UTF-8 string schar_length(R'2€')2
character_length(s)intnumber of UTF-8 characters in UTF-8 string scharacter_length(R'2€')2
charindex(substr, s)intstart position of substr in string s or 0 when not foundcharindex('gh', 'highigh')3
charindex(substr, s, i)intstart position of substr in string s starting at offset i or 0 when not foundcharindex('gh', 'highigh', 4)6
code(int)charUnicode character for the integer code pointcode(223)ß
concat(s1, s2)same as inputconcatenate strings s1 and s2. returns null when either s1 or s2 is nullconcat('Front ', 242)Front 242
greatest(s1, s2)same as inputalphabetically ordered last string value of strings s1 and s2. (same as sql_max(s1, s2))greatest('ab', 'ac')ac
insert(s1, int pos, int nr, s2)same as input arg 4insert string s2 into string s1 after position pos replacing nr characters in string s1. when pos is negative it counts from the end of the string. nr must be >= 0.insert('abcdef', 2, 1, 'XY')abXYdef
lcase(s)same as inputconvert string s to lower caselcase('MonetDB')monetdb
least(s1, s2)same as inputalphabetically ordered first string value of strings s1 and s2. (same as sql_min(s1, s2))least('ab', 'ac')ab
left(s, int n)same as input arg 1return first n characters in the stringleft('MonetDB', 3)Mon
length(s)intcount number of UTF-8 characters in UTF-8 string slength(R'2€')2
locate(s1, s2)intlocate string s1 in s2, when found return start position else 0locate('DB', 'MonetDB')6
locate(s1, s2, int pos)intlocate string s1 in s2 starting from position pos, when found return start position else 0locate('DB', 'DBMonetDB', 3)8
lower(s)same as inputconvert string s to lower caselower('MonetDB')monetdb
lpad(s, int len)same as input arg 1fill up the string to length len by prepending spaces. If the string is already longer than length then it is truncated (on the right).lpad('abc', 6)abc
lpad(s1, int len, s2)same as input arg 1fill up the string s1 to length len by prepending the characters s2. If the string s1 is already longer than length then it is truncated (on the right).lpad('abc', 6, 'DB')DBDabc
ltrim(s)varcharremove leading whitespace characters from string sltrim(E' \t\fab \tdf ')ab df
ltrim(s1, s2)varcharremove leading s2 strings from string s1ltrim('aabcdaa', 'a')bcdaa
sys.md5(s)clob(32)calculates the MD5 hash of string, returning the result as 32 character hex numbersys.md5('abc')900150983cd24fb0 d6963f7d28e17f72
sys.ms_stuff(varchar(32) s1, int pos, int nr, varchar(32) s2)varchar(32)insert string s2 into string s1 at position pos replacing nr characters in string s1. If pos <= 0 or > length of s1 then an empty string is returned. nr may be negative.sys.ms_stuff('abcdef', 2, 1, 'XY')aXYcdef
octet_length(s)intnumber of bytes in UTF-8 string soctet_length(R'2€')4
patindex(pattern, s)intsearch for pattern in string s, when found return the position of the last matched character, when not found return 0patindex('%bc%', 'abcd')3
position(s1 IN s2)intsearch for string s1 in string s2, when found return the position in s2 of the first match, when not found return 0position('DB' IN 'MonetDB')6
repeat(s, nr)same as input arg 1repeat string s the specified nr of timesrepeat('kiss,', 3)kiss,kiss,kiss,
replace(s, srch, repl)same as input arg 1replace occurrences of string srch in string s by string replreplace('abc', 'b', 'X')aXc
right(s, int n)same as input arg 1return last n characters in the stringright('MonetDB', 4)etDB
rpad(s, int)same as input arg 1fill up the string to length len by appending spaces. If the string is already longer than length then it is truncated (on the right).rpad('abc', 6) || 'End'abc End
rpad(s1, int len, s2)same as input arg 1fill up the string s1 to length len by appending the characters s2. If the string s1 is already longer than length then it is truncated (on the right).rpad('abc', 6, 'DB')abcDBD
rtrim(s)varcharremove trailing whitespace characters from string srtrim(E'ab df \t\n')ab df
rtrim(s1, s2)varcharremove trailing s2 strings from string s1rtrim('aabcdaa', 'a')aabcd
space(nr)clob(nr)generates a string with nr spacesselect 's' || space(4) || 'e's e
splitpart(s, sep, n)same as input arg 1extract substring from string s by spliting on separator string sep the field before nth occurrence. n must be greater than zerosplitpart('a|a|bc|cd', '|', 3)bc
sql_max(s1, s2)same as inputalphabetically ordered last string value of strings s1 and s2sql_max('ab', 'ac')ac
sql_min(s1, s2)same as inputalphabetically ordered first string value of strings s1 and s2sql_min('ab', 'ac')ab
startswith(s1, s2)booleanwhether string s1 starts with string s2. returns null when either s1 or s2 is nullstartswith('bcd', 'bc')true
startswith(s1, s2, icase)booleanwhether string s1 starts with string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is nullstartswith('bcd', 'BC', true)true
substr(s, startpos)same as input arg 1extract substring starting from position startpossubstr('abCde', 3)Cde
substr(s, startpos, count)clob(count)extract substring starting from position startpos for count characterssubstr('abCde', 2, 3)bCd
substring(s, startpos)same as input arg 1extract substring starting from position startpossubstring('abcde', 3)cde
substring(s, startpos, count)clob(count)extract substring starting from position startpos for count characterssubstring('abcde', 2, 2)bc
trim(s)varcharremove leading and trailing whitespace characters from string strim(E' \t\fab df \t\n')ab df
trim(s1, s2)varcharremove leading and trailing s2 strings from string s1trim('aabcdaa', 'a')bcd
"truncate"(s, n_chars)same as input arg 1truncate the string s to a maximum of n_chars characters. Note: the function name truncate need to be surrounded by double quotes."truncate"('Truncate', 4)Trun
ucase(s)same as inputconvert string s to upper caseucase('MonetDB')MONETDB
upper(s)same as inputconvert string s to upper caseupper('MonetDB')MONETDB
levenshtein(s1, s2)intcompute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Insertion/deletion and replacement costs are all 1.levenshtein('MonetDB', 'DB3')6
levenshtein(s1, s2, id, r)intcompute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.levenshtein('MonetDB', 'DB3', 1, 1)6
levenshtein(s1, s2, int id, int r, int t)intcompute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions, substitutions and transpositions) required to change one word into the other.levenshtein('MonetDB', 'DB3', 4, 2, 1)8
jarowinkler(s1, s2)doublecompute the 'Jaro-Winkler similarity' between two strings.jarowinkler('MonetDB', 'monetdb')0.8
dameraulevenshtein(s1, s2)intcompute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Insertion/deletion and replacement costs are all 1.levenshtein('MonetDB', 'DB3')6
dameraulevenshtein(s1, s2, int id, int r, int t)intcompute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.dameraulevenshtein('MonetDB', 'DB3', 1, 1, 1)6
editdistance(s1, s2)intAlias to Damerau-Levenshtein distance with insertion/deletion cost 1, replace cost 1 and transpose cost 2editdistance('MonetDB', 'DB3')6
editdistance2(s1, s2)intAlias to Damerau-Levenshtein distance with insertion/deletion cost 1, replace cost 1 and transpose cost 1editdistance2('MonetDB', 'DB3')6
soundex(s)varchar(4)computes the Soundex code from string s using a phonetic algorithmsoundex('MonetDB')M533
difference(s1, s2)intconverts two strings s1 and s2 to their soundex codes and reports the number of different code positions. Since soundex codes have four characters, the result ranges from zero to four, with zero being an exact match and four being no match.difference('MonetDB', 'DB3')4
qgramnormalize(s)same as inputnormalisation of a string s. Converts letters to uppercase and removes all characters not in a..zA..Z0..9 or spaceqgramnormalize('Prize €50!')PRIZE 50

Text analysis functions

FunctionReturn typeDescriptionExampleResult
contains(s1, s2)booleanwhether string s1 constains string s2. returns null when either s1 or s2 is nullcontains('abcd', 'bc')true
contains(s1, s2, icase)booleanwhether string s1 constains string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is nullcontains('abcd', 'BC', true)true
dameraulevenshtein(s1, s2, int insdel, int rep, int trans)intCompute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions/deletions, substitutions and transpositions) required to change one word into the other. Variable costs of insertions/deletions, substitutions and transpositions, with insdel, rep and trans respectivelydameraulevenshtein('MonetDB', 'DB3', 4, 2, 1)8
dameraulevenshtein(s1, s2)intCompute the 'Damerau-Levenshtein distance' where insertion/deletion, replacement and transposition costs are all 1dameraulevenshtein('MonetDB', 'DB3')6
difference(s1, s2)intconverts two strings s1 and s2 to their soundex codes and reports the number of different code positions. Since soundex codes have four characters, the result ranges from zero to four, with zero being an exact match and four being no match.difference('MonetDB', 'DB3')4
editdistance(s1, s2)intcompute the 'distance' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest stringeditdistance('MonetDB', 'DB3')6
editdistance2(s1, s2)intcompute the 'distance2' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest stringeditdistance2('MonetDB', 'DB3')6
endswith(s1, s2)booleanwhether string s1 ends with string s2. returns null when either s1 or s2 is nullendswith('abc', 'bc')true
endswith(s1, s2, icase)booleanwhether string s1 ends with string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is nullendswith('abc', 'BC', true)true
jarowinkler(s1, s2)doublecompute the 'Jaro-Winkler similarity' between two strings returning a double between 0 (exact match) and 1 (no similarity)jarowinkler('MonetDB', 'DB free')0.428571429
levenshtein(s1, s2, int insdel, int rep)intCompute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions/deletions or substitutions) required to change one word into the other. Variable costs of insertions/deletions or substitutions, with insdel and rep respectively.levenshtein('MonetDB', 'DB3', 4, 2)9
levenshtein(s1, s2)intCompute the 'Levenshtein distance' where insertion/deletion and replacement costs are all 1levenshtein('MonetDB', 'DB3')6
levenshtein(s1, s2, int insdel, int rep, int trans)intcompute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions, substitutions and transpositions) required to change one word into the other.levenshtein('MonetDB', 'DB3', 4, 2, 1)8
qgramnormalize(s)same as inputnormalisation of a string s. Converts letters to uppercase and removes all characters not in a..zA..Z0..9 or spaceqgramnormalize(R'Prize €50!')PRIZE 50
soundex(s)varchar(4)computes the Soundex code from string s using a phonetic algorithmsoundex('MonetDB')M533
startswith(s1, s2)booleanwhether string s1 starts with string s2. returns null when either s1 or s2 is nullstartswith('bcd', 'bc')true
startswith(s1, s2, icase)booleanwhether string s1 starts with string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is nullstartswith('bcd', 'BC', true)true

Tip: The text distance and similarity functions are grouped in the 'txtsim' module. To view them in your MonetDB server use query:

SELECT * FROM sys.functions WHERE mod = 'txtsim' AND type = 1 ORDER BY name;