These apply to MonetDB SQL character string types: character, character varying, varchar, clob and subtypes: json and url.
In MonetDB all character strings are UTF-8 encoded.
Operator | Description | Example | Result |
---|---|---|---|
|| | concatenate two strings. Returns null when one of the strings is null | 'Monet' || 'DB' | MonetDB |
|| | concatenation of a string with a non-string input | 'Value: ' || 42 | Value: 42 |
Function | Return type | Description | Example | Result |
---|---|---|---|---|
ascii(s) | int | Unicode code point of the first character of the string s | ascii(R'€') | 8364 |
asciify(s) | same as input | Replace all non-ASCII characters in string s with ASCII equivalents | asciify(R'a € sign') | a EUR sign |
char_length(s) | int | number of UTF-8 characters in UTF-8 string s | char_length(R'2€') | 2 |
character_length(s) | int | number of UTF-8 characters in UTF-8 string s | character_length(R'2€') | 2 |
charindex(substr, s) | int | start position of substr in string s or 0 when not found | charindex('gh', 'highigh') | 3 |
charindex(substr, s, i) | int | start position of substr in string s starting at offset i or 0 when not found | charindex('gh', 'highigh', 4) | 6 |
code(int) | char | Unicode character for the integer code point | code(223) | ß |
concat(s1, s2) | same as input | concatenate strings s1 and s2. returns null when either s1 or s2 is null | concat('Front ', 242) | Front 242 |
field(s, s1, s2, s3, ...) | int | Returns the index (position) of s in the s1, s2, s3, ... list. Returns NULL if str is not found. | field('foo', 'bar', 'foo', 'baz') | 1 |
greatest(s1, s2) | same as input | alphabetically ordered last string value of strings s1 and s2. (same as sql_max(s1, s2)) | greatest('ab', 'ac') | ac |
insert(s1, int pos, int nr, s2) | same as input arg 4 | insert string s2 into string s1 after position pos replacing nr characters in string s1. when pos is negative it counts from the end of the string. nr must be >= 0. | insert('abcdef', 2, 1, 'XY') | abXYdef |
lcase(s) | same as input | convert string s to lower case | lcase('MonetDB') | monetdb |
least(s1, s2) | same as input | alphabetically ordered first string value of strings s1 and s2. (same as sql_min(s1, s2)) | least('ab', 'ac') | ab |
left(s, int n) | same as input arg 1 | return first n characters in the string | left('MonetDB', 3) | Mon |
length(s) | int | count number of UTF-8 characters in UTF-8 string s | length(R'2€') | 2 |
locate(s1, s2) | int | locate string s1 in s2, when found return start position else 0 | locate('DB', 'MonetDB') | 6 |
locate(s1, s2, int pos) | int | locate string s1 in s2 starting from position pos, when found return start position else 0 | locate('DB', 'DBMonetDB', 3) | 8 |
lower(s) | same as input | convert string s to lower case | lower('MonetDB') | monetdb |
lpad(s, int len) | same as input arg 1 | fill up the string to length len by prepending spaces. If the string is already longer than length then it is truncated (on the right). | lpad('abc', 6) | abc |
lpad(s1, int len, s2) | same as input arg 1 | fill up the string s1 to length len by prepending the characters s2. If the string s1 is already longer than length then it is truncated (on the right). | lpad('abc', 6, 'DB') | DBDabc |
ltrim(s) | varchar | remove leading whitespace characters from string s | ltrim(E' \t\fab \tdf ') | ab df |
ltrim(s1, s2) | varchar | remove leading s2 strings from string s1 | ltrim('aabcdaa', 'a') | bcdaa |
sys.md5(s) | clob(32) | calculates the MD5 hash of string, returning the result as 32 character hex number | sys.md5('abc') | 900150983cd24fb0 d6963f7d28e17f72 |
sys.ms_stuff(varchar(32) s1, int pos, int nr, varchar(32) s2) | varchar(32) | insert string s2 into string s1 at position pos replacing nr characters in string s1. If pos <= 0 or > length of s1 then an empty string is returned. nr may be negative. | sys.ms_stuff('abcdef', 2, 1, 'XY') | aXYcdef |
octet_length(s) | int | number of bytes in UTF-8 string s | octet_length(R'2€') | 4 |
patindex(pattern, s) | int | search for pattern in string s, when found return the position of the last matched character, when not found return 0 | patindex('%bc%', 'abcd') | 3 |
position(s1 IN s2) | int | search for string s1 in string s2, when found return the position in s2 of the first match, when not found return 0 | position('DB' IN 'MonetDB') | 6 |
repeat(s, nr) | same as input arg 1 | repeat string s the specified nr of times | repeat('kiss,', 3) | kiss,kiss,kiss, |
replace(s, srch, repl) | same as input arg 1 | replace occurrences of string srch in string s by string repl | replace('abc', 'b', 'X') | aXc |
right(s, int n) | same as input arg 1 | return last n characters in the string | right('MonetDB', 4) | etDB |
rpad(s, int) | same as input arg 1 | fill up the string to length len by appending spaces. If the string is already longer than length then it is truncated (on the right). | rpad('abc', 6) || 'End' | abc End |
rpad(s1, int len, s2) | same as input arg 1 | fill up the string s1 to length len by appending the characters s2. If the string s1 is already longer than length then it is truncated (on the right). | rpad('abc', 6, 'DB') | abcDBD |
rtrim(s) | varchar | remove trailing whitespace characters from string s | rtrim(E'ab df \t\n') | ab df |
rtrim(s1, s2) | varchar | remove trailing s2 strings from string s1 | rtrim('aabcdaa', 'a') | aabcd |
space(nr) | clob(nr) | generates a string with nr spaces | select 's' || space(4) || 'e' | s e |
splitpart(s, sep, n) | same as input arg 1 | extract substring from string s by splitting on separator string sep the field before nth occurrence. n must be greater than zero | splitpart('a|a|bc|cd', '|', 3) | bc |
sql_max(s1, s2) | same as input | alphabetically ordered last string value of strings s1 and s2 | sql_max('ab', 'ac') | ac |
sql_min(s1, s2) | same as input | alphabetically ordered first string value of strings s1 and s2 | sql_min('ab', 'ac') | ab |
startswith(s1, s2) | boolean | whether string s1 starts with string s2. returns null when either s1 or s2 is null | startswith('bcd', 'bc') | true |
startswith(s1, s2, icase) | boolean | whether string s1 starts with string s2 matching case insensitive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null | startswith('bcd', 'BC', true) | true |
substr(s, startpos) | same as input arg 1 | extract substring starting from position startpos | substr('abCde', 3) | Cde |
substr(s, startpos, count) | clob(count) | extract substring starting from position startpos for count characters | substr('abCde', 2, 3) | bCd |
substring(s, startpos) | same as input arg 1 | extract substring starting from position startpos | substring('abcde', 3) | cde |
substring(s, startpos, count) | clob(count) | extract substring starting from position startpos for count characters | substring('abcde', 2, 2) | bc |
trim(s) | varchar | remove leading and trailing whitespace characters from string s | trim(E' \t\fab df \t\n') | ab df |
trim(s1, s2) | varchar | remove leading and trailing s2 strings from string s1 | trim('aabcdaa', 'a') | bcd |
"truncate"(s, n_chars) | same as input arg 1 | truncate the string s to a maximum of n_chars characters. Note: the function name truncate need to be surrounded by double quotes. | "truncate"('Truncate', 4) | Trun |
ucase(s) | same as input | convert string s to upper case | ucase('MonetDB') | MONETDB |
upper(s) | same as input | convert string s to upper case | upper('MonetDB') | MONETDB |
levenshtein(s1, s2) | int | compute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Insertion/deletion and replacement costs are all 1. | levenshtein('MonetDB', 'DB3') | 6 |
levenshtein(s1, s2, id, r) | int | compute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. | levenshtein('MonetDB', 'DB3', 1, 1) | 6 |
levenshtein(s1, s2, int id, int r, int t) | int | compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions, substitutions and transpositions) required to change one word into the other. | levenshtein('MonetDB', 'DB3', 4, 2, 1) | 8 |
jarowinkler(s1, s2) | double | compute the 'Jaro-Winkler similarity' between two strings. | jarowinkler('MonetDB', 'monetdb') | 0.8 |
dameraulevenshtein(s1, s2) | int | compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Insertion/deletion and replacement costs are all 1. | levenshtein('MonetDB', 'DB3') | 6 |
dameraulevenshtein(s1, s2, int id, int r, int t) | int | compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. | dameraulevenshtein('MonetDB', 'DB3', 1, 1, 1) | 6 |
editdistance(s1, s2) | int | Alias to Damerau-Levenshtein distance with insertion/deletion cost 1, replace cost 1 and transpose cost 2 | editdistance('MonetDB', 'DB3') | 6 |
editdistance2(s1, s2) | int | Alias to Damerau-Levenshtein distance with insertion/deletion cost 1, replace cost 1 and transpose cost 1 | editdistance2('MonetDB', 'DB3') | 6 |
soundex(s) | varchar(4) | computes the Soundex code from string s using a phonetic algorithm | soundex('MonetDB') | M533 |
difference(s1, s2) | int | converts two strings s1 and s2 to their soundex codes and reports the number of different code positions. Since soundex codes have four characters, the result ranges from zero to four, with zero being an exact match and four being no match. | difference('MonetDB', 'DB3') | 4 |
qgramnormalize(s) | same as input | normalisation of a string s. Converts letters to uppercase and removes all characters not in a..zA..Z0..9 or space | qgramnormalize('Prize €50!') | PRIZE 50 |
Function | Return type | Description | Example | Result |
---|---|---|---|---|
contains(s1, s2) | boolean | whether string s1 contains string s2. returns null when either s1 or s2 is null | contains('abcd', 'bc') | true |
contains(s1, s2, icase) | boolean | whether string s1 contains string s2 matching case insensitive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null | contains('abcd', 'BC', true) | true |
dameraulevenshtein(s1, s2, int insdel, int rep, int trans) | int | Compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions/deletions, substitutions and transpositions) required to change one word into the other. Variable costs of insertions/deletions, substitutions and transpositions, with insdel , rep and trans respectively | dameraulevenshtein('MonetDB', 'DB3', 4, 2, 1) | 8 |
dameraulevenshtein(s1, s2) | int | Compute the 'Damerau-Levenshtein distance' where insertion/deletion, replacement and transposition costs are all 1 | dameraulevenshtein('MonetDB', 'DB3') | 6 |
difference(s1, s2) | int | converts two strings s1 and s2 to their soundex codes and reports the number of different code positions. Since soundex codes have four characters, the result ranges from zero to four, with zero being an exact match and four being no match. | difference('MonetDB', 'DB3') | 4 |
editdistance(s1, s2) | int | compute the 'distance' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string | editdistance('MonetDB', 'DB3') | 6 |
editdistance2(s1, s2) | int | compute the 'distance2' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string | editdistance2('MonetDB', 'DB3') | 6 |
endswith(s1, s2) | boolean | whether string s1 ends with string s2. returns null when either s1 or s2 is null | endswith('abc', 'bc') | true |
endswith(s1, s2, icase) | boolean | whether string s1 ends with string s2 matching case insensitive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null | endswith('abc', 'BC', true) | true |
jarowinkler(s1, s2) | double | compute the 'Jaro-Winkler similarity' between two strings returning a double between 0 (exact match) and 1 (no similarity) | jarowinkler('MonetDB', 'DB free') | 0.428571429 |
levenshtein(s1, s2, int insdel, int rep) | int | Compute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions/deletions or substitutions) required to change one word into the other. Variable costs of insertions/deletions or substitutions, with insdel and rep respectively. | levenshtein('MonetDB', 'DB3', 4, 2) | 9 |
levenshtein(s1, s2) | int | Compute the 'Levenshtein distance' where insertion/deletion and replacement costs are all 1 | levenshtein('MonetDB', 'DB3') | 6 |
levenshtein(s1, s2, int insdel, int rep, int trans) | int | compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions, substitutions and transpositions) required to change one word into the other. | levenshtein('MonetDB', 'DB3', 4, 2, 1) | 8 |
qgramnormalize(s) | same as input | normalisation of a string s. Converts letters to uppercase and removes all characters not in a..zA..Z0..9 or space | qgramnormalize(R'Prize €50!') | PRIZE 50 |
soundex(s) | varchar(4) | computes the Soundex code from string s using a phonetic algorithm | soundex('MonetDB') | M533 |
startswith(s1, s2) | boolean | whether string s1 starts with string s2. returns null when either s1 or s2 is null | startswith('bcd', 'bc') | true |
startswith(s1, s2, icase) | boolean | whether string s1 starts with string s2 matching case insensitive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null | startswith('bcd', 'BC', true) | true |
Tip: The text distance and similarity functions are grouped in the 'txtsim' module. To view them in your MonetDB server use query:
SELECT * FROM sys.functions WHERE mod = 'txtsim' AND type = 1 ORDER BY name;