[Monetdb-developers] RFC: New Integer Atoms: byte & word
Dear MonetDB developers, I'm almost done with the first steps of adding two new integer atoms to MonetDB: a) a tiny (8-bit) integer. b) a machine-word sized integer that grows with the system, i.e., 32-bit on 32-bit systems and 64-bit on 64-bit systems. (a) is supposed to replace "chr" wherever "chr" is not used as a character type, but as a (tiny) 1-byte integer. (b) is supposed to be the MIL pendant of the C type "var_t" and can/should be used, e.g., for BUN counts of BATs which are limited to 32-bit on 32-bit systems, but can grow to 64-bit on 64-bit systems. For now, I chose the following names, sticking to the 3-letter "design", and picking combinations that do not yet exist in the code base: a) "bte" (read "byte") b) "wrd" (read "word") In a first checking (hopefully sometime next week), I will only add these new atoms (and add the proper functionality & new signatures where necessary), but not change any existing MIL proc/command signatures. In a second check-in (maybe also already next week?), I plan to change (at least) the signature(s) of "count(BAT[any,any]) : int" & "count(int) : lng" to a single "count(BAT[any,any]) : wrd". Maybe, I/we should later also consider to replace "chr" by "bte" in the enum module. Well, so far so good. I would be grateful, if you could comment on these plans. Especially, I'd like to hear, whether there are better(?) suggestions for the names "bte" & "wrd". Further, did I miss any places, where we should/must use "bte" iso. "chr" and/or "wrd" iso. "int"/"lng"? Thank you very much in advance! Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Thu, Dec 08, 2005 at 03:23:21PM +0100, Stefan Manegold wrote:
a) "bte" (read "byte") b) "wrd" (read "word")
I think "byt" might be slightly more obvious than "bte". But then, I would prefer "byte" (gasp), "word" (shiver) and "long" (shock horror) over "bte", "wrd" and "long" anyway. I am aware that the latter is probably out of the question due to backward compatibility. Joeri
On Thu, Dec 08, 2005 at 05:09:14PM +0100, Joeri van Ruth wrote:
[...]
But then, I would prefer "byte" (gasp), "word" (shiver) and "long" (shock horror) over "bte", "wrd" and "long" anyway.
[...]
I second this. Is there some (technical) reason to use only three character identifiers instead of readable four character ones? (We could spent those couple of characters here that we saved on replacing all those "de_NO_project" and the like... ;-) ) Jens -- Jens Teubner Technische Universitaet Muenchen, Department of Informatics D-85748 Garching, Germany Tel: +49 89 289-17259 Fax: +49 89 289-17263 The purpose of most computer languages is to lengthen your resume by a word and a comma. -- Larry Wall
On 12/8/05 5:31 PM, Jens Teubner wrote with possible deletions:
On Thu, Dec 08, 2005 at 05:09:14PM +0100, Joeri van Ruth wrote:
[...]
But then, I would prefer "byte" (gasp), "word" (shiver) and "long" (shock horror) over "bte", "wrd" and "long" anyway.
[...]
I second this. Is there some (technical) reason to use only three character identifiers instead of readable four character ones?
Yes, please use 'byt' and 'wrd': 'bte' triggers nothing but the 'bite' association, for me at least. And we're all bitten by MonetDB enough already, he he. Cheers, --Teggy -- | Prof. Dr. Torsten Grust grust@in.tum.de | | http://www-db.in.tum.de/~grust/ | | Database Systems - Technische Universität München (Germany) |
On 08-12-2005 17:38:34 +0100, Torsten Grust wrote:
On 12/8/05 5:31 PM, Jens Teubner wrote with possible deletions:
On Thu, Dec 08, 2005 at 05:09:14PM +0100, Joeri van Ruth wrote:
[...]
But then, I would prefer "byte" (gasp), "word" (shiver) and "long" (shock horror) over "bte", "wrd" and "long" anyway.
[...]
I second this. Is there some (technical) reason to use only three character identifiers instead of readable four character ones?
Yes, please use 'byt' and 'wrd': 'bte' triggers nothing but the 'bite' association, for me at least. And we're all bitten by MonetDB enough already, he he.
What about 'oct' and 'reg'? :p
On Thu, Dec 08, 2005 at 03:23:21PM +0100, Stefan Manegold wrote:
Dear MonetDB developers,
I'm almost done with the first steps of adding two new integer atoms to MonetDB:
a) a tiny (8-bit) integer.
b) a machine-word sized integer that grows with the system, i.e., 32-bit on 32-bit systems and 64-bit on 64-bit systems.
(a) is supposed to replace "chr" wherever "chr" is not used as a character type, but as a (tiny) 1-byte integer.
(b) is supposed to be the MIL pendant of the C type "var_t" and can/should be used, e.g., for BUN counts of BATs which are limited to 32-bit on 32-bit systems, but can grow to 64-bit on 64-bit systems.
For now, I chose the following names, sticking to the 3-letter "design", and picking combinations that do not yet exist in the code base:
a) "bte" (read "byte") b) "wrd" (read "word")
In a first checking (hopefully sometime next week), I will only add these new atoms (and add the proper functionality & new signatures where necessary), but not change any existing MIL proc/command signatures.
In a second check-in (maybe also already next week?), I plan to change (at least) the signature(s) of "count(BAT[any,any]) : int" & "count(int) : lng" to a single "count(BAT[any,any]) : wrd".
Maybe, I/we should later also consider to replace "chr" by "bte" in the enum module.
Well, so far so good.
I would be grateful, if you could comment on these plans.
Especially, I'd like to hear, whether there are better(?) suggestions for the names "bte" & "wrd". Further, did I miss any places, where we should/must use "bte" iso. "chr" and/or "wrd" iso. "int"/"lng"?
He weren't you on holiday ;-). I have no objections on the bte/wrd. For now I also do not see any more places where chr/int/lng need to be replaced with the new bte/wrd. Anyway if we find more later, these could be simply fixed. Niels
Thank you very much in advance!
Stefan
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Stefan Manegold wrote:
Dear MonetDB developers,
I'm almost done with the first steps of adding two new integer atoms to MonetDB:
a) a tiny (8-bit) integer. Signed or unsigned? name: tint
Why not using int8, which also allows for nibbles int4, int2
b) a machine-word sized integer that grows with the system, i.e., 32-bit on 32-bit systems and 64-bit on 64-bit systems.
(a) is supposed to replace "chr" wherever "chr" is not used as a character type, but as a (tiny) 1-byte integer.
please keep 'chr', I don't mind if we use 'char' instead
(b) is supposed to be the MIL pendant of the C type "var_t" and can/should be used, e.g., for BUN counts of BATs which are limited to 32-bit on 32-bit systems, but can grow to 64-bit on 64-bit systems.
For now, I chose the following names, sticking to the 3-letter "design", and picking combinations that do not yet exist in the code base:
use full words, there is (implicit) design reason for it, but at a few places we may have to look a little further. For the cpu cycles freaks, you might use 'tiny' instead of 'bte', because 'bit' starts alsi with a 'b'
a) "bte" (read "byte") b) "wrd" (read "word")
In a first checking (hopefully sometime next week), I will only add these new atoms (and add the proper functionality & new signatures where necessary), but not change any existing MIL proc/command signatures.
In a second check-in (maybe also already next week?), I plan to change (at least) the signature(s) of "count(BAT[any,any]) : int" & "count(int) : lng" to a single "count(BAT[any,any]) : wrd".
Introduction of more atomary types leads to more coercions and signatures in M5. It may actually force me to introduce the union type property to keep things readable. But, if we have a union type constructor, why not making it a first class citizen? command bat.insert(b:bat[:any_1{type=void,int},:any_2{type=str,chr}....
Maybe, I/we should later also consider to replace "chr" by "bte" in the enum module.
Well, so far so good.
I would be grateful, if you could comment on these plans.
Especially, I'd like to hear, whether there are better(?) suggestions for the names "bte" & "wrd". Further, did I miss any places, where we should/must use "bte" iso. "chr" and/or "wrd" iso. "int"/"lng"?
Thank you very much in advance!
Stefan
On Thu, Dec 08, 2005 at 06:23:12PM +0100, Martin Kersten wrote:
Stefan Manegold wrote:
Dear MonetDB developers,
I'm almost done with the first steps of adding two new integer atoms to MonetDB:
a) a tiny (8-bit) integer. Signed or unsigned? name: tint
All integers in MonetDB are signed, ie for consistency it should be signed. I prefer bte over tint.
Why not using int8, which also allows for nibbles int4, int2
b) a machine-word sized integer that grows with the system, i.e., 32-bit on 32-bit systems and 64-bit on 64-bit systems.
(a) is supposed to replace "chr" wherever "chr" is not used as a character type, but as a (tiny) 1-byte integer.
please keep 'chr', I don't mind if we use 'char' instead
Stefan did suggest to removed 'chr', but only to replace using 'chr' wherever it is not used as character but as 1-byte int.
(b) is supposed to be the MIL pendant of the C type "var_t" and can/should be used, e.g., for BUN counts of BATs which are limited to 32-bit on 32-bit systems, but can grow to 64-bit on 64-bit systems.
For now, I chose the following names, sticking to the 3-letter "design", and picking combinations that do not yet exist in the code base:
use full words, there is (implicit) design reason for it, but at a few places we may have to look a little further. For the cpu cycles freaks, you might use 'tiny' instead of 'bte', because 'bit' starts alsi with a 'b'
a) "bte" (read "byte") b) "wrd" (read "word")
In a first checking (hopefully sometime next week), I will only add these new atoms (and add the proper functionality & new signatures where necessary), but not change any existing MIL proc/command signatures.
In a second check-in (maybe also already next week?), I plan to change (at least) the signature(s) of "count(BAT[any,any]) : int" & "count(int) : lng" to a single "count(BAT[any,any]) : wrd".
Introduction of more atomary types leads to more coercions and signatures in M5. It may actually force me to introduce the union type property to keep things readable. But, if we have a union type constructor, why not making it a first class citizen?
command bat.insert(b:bat[:any_1{type=void,int},:any_2{type=str,chr}....
Niels
Maybe, I/we should later also consider to replace "chr" by "bte" in the enum module.
Well, so far so good.
I would be grateful, if you could comment on these plans.
Especially, I'd like to hear, whether there are better(?) suggestions for the names "bte" & "wrd". Further, did I miss any places, where we should/must use "bte" iso. "chr" and/or "wrd" iso. "int"/"lng"?
Thank you very much in advance!
Stefan
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Dear all, thank you very much for all you reactions. Since I'm indeed (supposed to be) on holidays, here's just a quick reply --- all further action (at least from my side) will not happen before next week: - the two new types are/will be signed; the way NIL is stored and treated in MonetDB keeps us from having unsigned integer types (unless we omit a whole bit as with OID) - I'll keep the name discussion open until next week, but I'd also vote for "byte" and "word" - the two types are additions; all esiting types will continue to exist as they are now; we might only what to use "byte" iso. "chr" where we need a 1-byte integer (as opposed to a (1-byte) character). - uchr attempt had failed, and uchr had disappeared, again, some time ago - I cannot judge the extra signature/type checking overhead in M5, but I found it no problem in M4 - I wouldn't mind considering other name chance and further clean-up, but I'd recommend to keep them separated from the introduction of "byte" and "word" (at least make sure we have separate, non-overlapping checkins) - same holds for improved printinh of atoms. - for M4, we should (try to) keep and eye on backward compatibility, at least at far as it does not block required/desired changes; M5 indeed breaks any compatibility below SQL/XQuery (appart from the dbfarms?), hence, we're free to do any "cleanup" there... - "double" size are not required; e.g., "sum_lng(BAT[any,sht]) : lng" and alike already exist for quite some time, and obviously, there will also be "sum_wrd(BAT[any,sht]) : lng" and alike ... Ok. That's all for now. back to "holidays" ;-) Stefan On Thu, Dec 08, 2005 at 07:06:29PM +0100, Niels Nes wrote:
On Thu, Dec 08, 2005 at 06:23:12PM +0100, Martin Kersten wrote:
Stefan Manegold wrote:
Dear MonetDB developers,
I'm almost done with the first steps of adding two new integer atoms to MonetDB:
a) a tiny (8-bit) integer. Signed or unsigned? name: tint
All integers in MonetDB are signed, ie for consistency it should be signed. I prefer bte over tint.
Why not using int8, which also allows for nibbles int4, int2
b) a machine-word sized integer that grows with the system, i.e., 32-bit on 32-bit systems and 64-bit on 64-bit systems.
(a) is supposed to replace "chr" wherever "chr" is not used as a character type, but as a (tiny) 1-byte integer.
please keep 'chr', I don't mind if we use 'char' instead
Stefan did suggest to removed 'chr', but only to replace using 'chr' wherever it is not used as character but as 1-byte int.
(b) is supposed to be the MIL pendant of the C type "var_t" and can/should be used, e.g., for BUN counts of BATs which are limited to 32-bit on 32-bit systems, but can grow to 64-bit on 64-bit systems.
For now, I chose the following names, sticking to the 3-letter "design", and picking combinations that do not yet exist in the code base:
use full words, there is (implicit) design reason for it, but at a few places we may have to look a little further. For the cpu cycles freaks, you might use 'tiny' instead of 'bte', because 'bit' starts alsi with a 'b'
a) "bte" (read "byte") b) "wrd" (read "word")
In a first checking (hopefully sometime next week), I will only add these new atoms (and add the proper functionality & new signatures where necessary), but not change any existing MIL proc/command signatures.
In a second check-in (maybe also already next week?), I plan to change (at least) the signature(s) of "count(BAT[any,any]) : int" & "count(int) : lng" to a single "count(BAT[any,any]) : wrd".
Introduction of more atomary types leads to more coercions and signatures in M5. It may actually force me to introduce the union type property to keep things readable. But, if we have a union type constructor, why not making it a first class citizen?
command bat.insert(b:bat[:any_1{type=void,int},:any_2{type=str,chr}....
Niels
Maybe, I/we should later also consider to replace "chr" by "bte" in the enum module.
Well, so far so good.
I would be grateful, if you could comment on these plans.
Especially, I'd like to hear, whether there are better(?) suggestions for the names "bte" & "wrd". Further, did I miss any places, where we should/must use "bte" iso. "chr" and/or "wrd" iso. "int"/"lng"?
Thank you very much in advance!
Stefan
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
--
Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (7)
-
Fabian Groffen
-
Jens Teubner
-
Joeri van Ruth
-
Martin Kersten
-
Niels Nes
-
Stefan Manegold
-
Torsten Grust