Announcement: New Feb2013 Feature release of MonetDB suite
The MonetDB team at CWI/MonetDB BV is pleased to announce the Feb2013 feature release of the MonetDB suite of programs. More information about MonetDB can be found on our website at http://www.monetdb.org/. For details on this release, please see the release notes at http://www.monetdb.org/Downloads/ReleaseNotes. As usual, the download location is http://dev.monetdb.org/downloads/. Feb 2013 feature release Testing Environment * enabled "top-level" Mtest.py So far, while Mtest.py could be called in any subdirectory of the MonetDB source tree (and could then run all tests in the entire sub-tree), it was not possible to call Mtest.py in the top-level MonetDB source directory to run all tests. Instead, to run all tests, Mtest.py had to be called at least 4 times, once in each of these directories: "clients", "monetdb5", "sql", "geom". Now, it is possible to call Mtest.py once in the top-level MonetDB source directory to run all tests in one go. The behaviour of calling Mtest.py in any subdirectory, including the four mentioned above, did not changed, other than that now obsolete command line options "-p / --package <package>" and "-5 / --monetdb5" have been removed. Java Module * merocontrol was changed to return server URIs, and lastStop time. Connections and dbpath were removed. * Mapi protocol v8 support was removed from MapiSocket. Protocol v8 has not been used by the servers any more since Apr2012 release Client Package * Mapi protocol v8 support was removed from all client drivers. Protocol v8 has not been used by the servers any more since Apr2012 release * The tool mnc was removed from installations * msqldump: Implmented an option (--table/-t) to dump a single table. * Changed msqdump's trace option to be in line with mclient. In both cases, the long option is --Xdebug and the short option is -X. MonetDB5 Server * mserver5: The --dbname and --dbfarm options have been replaced by the single --dbpath option. * The scheduler of mserver5 was changed to use a fixed set of workers to perform the work for all connected clients. Previously, each client connection had its own set of workers, easily causing resource problems upon multiple connections to the server. Merovingian * Upgrade support for dbfarms from Mar2011 and Aug2011 was dropped * monetdb status now uses a more condensed output, to cater for the uris being shown, and prints how long a database is stopped, or how long ago it crashed * monetdb status now prints the connection uri for each database, when available. The connections and database path properties have been dropped. * monetdb status now prints last crash date only if the database has not been started since. Bug Fixes * 2291: small doubles end up as NULL after arithmetic * 3215: Calculation Date function using interval and year * 3033: stethoscope needs better documentation * 3084: Timestamp arithmetic very slow (especially on Windows) * 3125: Python tests fail after recent Python API changes * 3172: assertion fails on table function with subselects as parameters * 3178: one scan is enough to implement ALGstdev_@1 in monetdb5/modules/kernel/algebra.mx * 3179: LIKE: batstr.like+algebra.uselect called instead of pcre.like_filter * 3193: Expressions not supported in the GROUP BY or ORDER BY clause. * 3216: "unknown property" error setting format and width in .monetdb file * 3217: gdk_posix fails to compile under NetBSD * 3221: can no execute large statements * 3227: MT_set_lock() call on an non-initialized lock
Hi, Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for: AGGRsubxml AGGRsubxmlcand BATxmlaggr Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs. I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore. Thanks On 02/19/2013 12:32 PM, Sjoerd Mullender wrote:
The MonetDB team at CWI/MonetDB BV is pleased to announce the Feb2013 feature release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at http://www.monetdb.org/.
For details on this release, please see the release notes at http://www.monetdb.org/Downloads/ReleaseNotes.
As usual, the download location is http://dev.monetdb.org/downloads/.
Feb 2013 feature release Testing Environment * enabled "top-level" Mtest.py So far, while Mtest.py could be called in any subdirectory of the MonetDB source tree (and could then run all tests in the entire sub-tree), it was not possible to call Mtest.py in the top-level MonetDB source directory to run all tests. Instead, to run all tests, Mtest.py had to be called at least 4 times, once in each of these directories: "clients", "monetdb5", "sql", "geom". Now, it is possible to call Mtest.py once in the top-level MonetDB source directory to run all tests in one go. The behaviour of calling Mtest.py in any subdirectory, including the four mentioned above, did not changed, other than that now obsolete command line options "-p / --package <package>" and "-5 / --monetdb5" have been removed. Java Module * merocontrol was changed to return server URIs, and lastStop time. Connections and dbpath were removed. * Mapi protocol v8 support was removed from MapiSocket. Protocol v8 has not been used by the servers any more since Apr2012 release Client Package * Mapi protocol v8 support was removed from all client drivers. Protocol v8 has not been used by the servers any more since Apr2012 release * The tool mnc was removed from installations * msqldump: Implmented an option (--table/-t) to dump a single table. * Changed msqdump's trace option to be in line with mclient. In both cases, the long option is --Xdebug and the short option is -X. MonetDB5 Server * mserver5: The --dbname and --dbfarm options have been replaced by the single --dbpath option. * The scheduler of mserver5 was changed to use a fixed set of workers to perform the work for all connected clients. Previously, each client connection had its own set of workers, easily causing resource problems upon multiple connections to the server. Merovingian * Upgrade support for dbfarms from Mar2011 and Aug2011 was dropped * monetdb status now uses a more condensed output, to cater for the uris being shown, and prints how long a database is stopped, or how long ago it crashed * monetdb status now prints the connection uri for each database, when available. The connections and database path properties have been dropped. * monetdb status now prints last crash date only if the database has not been started since. Bug Fixes * 2291: small doubles end up as NULL after arithmetic * 3215: Calculation Date function using interval and year * 3033: stethoscope needs better documentation * 3084: Timestamp arithmetic very slow (especially on Windows) * 3125: Python tests fail after recent Python API changes * 3172: assertion fails on table function with subselects as parameters * 3178: one scan is enough to implement ALGstdev_@1 in monetdb5/modules/kernel/algebra.mx * 3179: LIKE: batstr.like+algebra.uselect called instead of pcre.like_filter * 3193: Expressions not supported in the GROUP BY or ORDER BY clause. * 3216: "unknown property" error setting format and width in .monetdb file * 3217: gdk_posix fails to compile under NetBSD * 3221: can no execute large statements * 3227: MT_set_lock() call on an non-initialized lock _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Take a look at http://www.monetdb.org/content/internal-changes-feb2013-release for some information about what has changed. It doesn't help you directly, but it does give some background. You may be able to figure out what needs to be changed, or at least where to look. On 2013-02-20 19:52, Miguel Ping wrote:
Hi,
Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for:
AGGRsubxml AGGRsubxmlcand BATxmlaggr
Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs.
I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore. Thanks
On 02/19/2013 12:32 PM, Sjoerd Mullender wrote:
The MonetDB team at CWI/MonetDB BV is pleased to announce the Feb2013 feature release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at http://www.monetdb.org/.
For details on this release, please see the release notes at http://www.monetdb.org/Downloads/ReleaseNotes.
As usual, the download location is http://dev.monetdb.org/downloads/.
Feb 2013 feature release Testing Environment * enabled "top-level" Mtest.py So far, while Mtest.py could be called in any subdirectory of the MonetDB source tree (and could then run all tests in the entire sub-tree), it was not possible to call Mtest.py in the top-level MonetDB source directory to run all tests. Instead, to run all tests, Mtest.py had to be called at least 4 times, once in each of these directories: "clients", "monetdb5", "sql", "geom". Now, it is possible to call Mtest.py once in the top-level MonetDB source directory to run all tests in one go. The behaviour of calling Mtest.py in any subdirectory, including the four mentioned above, did not changed, other than that now obsolete command line options "-p / --package <package>" and "-5 / --monetdb5" have been removed. Java Module * merocontrol was changed to return server URIs, and lastStop time. Connections and dbpath were removed. * Mapi protocol v8 support was removed from MapiSocket. Protocol v8 has not been used by the servers any more since Apr2012 release Client Package * Mapi protocol v8 support was removed from all client drivers. Protocol v8 has not been used by the servers any more since Apr2012 release * The tool mnc was removed from installations * msqldump: Implmented an option (--table/-t) to dump a single table. * Changed msqdump's trace option to be in line with mclient. In both cases, the long option is --Xdebug and the short option is -X. MonetDB5 Server * mserver5: The --dbname and --dbfarm options have been replaced by the single --dbpath option. * The scheduler of mserver5 was changed to use a fixed set of workers to perform the work for all connected clients. Previously, each client connection had its own set of workers, easily causing resource problems upon multiple connections to the server. Merovingian * Upgrade support for dbfarms from Mar2011 and Aug2011 was dropped * monetdb status now uses a more condensed output, to cater for the uris being shown, and prints how long a database is stopped, or how long ago it crashed * monetdb status now prints the connection uri for each database, when available. The connections and database path properties have been dropped. * monetdb status now prints last crash date only if the database has not been started since. Bug Fixes * 2291: small doubles end up as NULL after arithmetic * 3215: Calculation Date function using interval and year * 3033: stethoscope needs better documentation * 3084: Timestamp arithmetic very slow (especially on Windows) * 3125: Python tests fail after recent Python API changes * 3172: assertion fails on table function with subselects as parameters * 3178: one scan is enough to implement ALGstdev_@1 in monetdb5/modules/kernel/algebra.mx * 3179: LIKE: batstr.like+algebra.uselect called instead of pcre.like_filter * 3193: Expressions not supported in the GROUP BY or ORDER BY clause. * 3216: "unknown property" error setting format and width in .monetdb file * 3217: gdk_posix fails to compile under NetBSD * 3221: can no execute large statements * 3227: MT_set_lock() call on an non-initialized lock _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQCVAwUBUSYIRT7g04AjvIQpAQLnwQQAiIdx8fL+3eVTu8ttMMw0VA6FguihuRWO PGiV9wc9ETETtFrrhU+DsjY0BVMGHBGvRDNgCxdIXzhZp6BG7h4/QlxXEHDonifJ XY2l758YS7bUxanCzJi/efTl4fIT7S7W9OEEoO1PJFdElfMFsDXhjlz1beVJhMaW A6YG6szjxVY= =YEne -----END PGP SIGNATURE-----
Hello, On 02/21/2013 11:43 AM, Sjoerd Mullender wrote:
Take a look at http://www.monetdb.org/content/internal-changes-feb2013-release for some information about what has changed. It doesn't help you directly, but it does give some background. You may be able to figure out what needs to be changed, or at least where to look.
I feel that this is an area that deserves to be improved in MonetDB. Is it really necessary to have a deep knowledge of the MonetDB internals in order to create a custom aggregate function? Can we have a "stable" interface to code against? I have developed "custom aggregates" for PostgreSQL, MS SQL Server and H2. Please compare and contrast the simplicity of developing a custom aggregate in these databases: PostgreSQL: http://www.postgresql.org/docs/9.2/static/sql-createaggregate.html MS-SQL Server: http://technet.microsoft.com/en-us/library/ms131051%28v=sql.90%29.aspx H2: http://www.h2database.com/javadoc/org/h2/api/AggregateFunction.html The MonetDB equivalent is ... well... grep'ing through source code, searching the mailing lists, lots of trial and error, interpreting cryptic error messages (if we are lucky enough to have one), stepping trough debuggers and in general having a miserable experience. Anyway... I don't want to sound ungrateful, MonetDB really is something amazing and you guys absolutely rock for offering the world the fruits of your intellect but please be aware that for most us mere mortals creating custom aggregates in MonetDB is severely non trivial. I was reading trough the above blog post and thinking "cool!... very cool! ... wow! even faster!... amazing" and then hit me... "crap! I will have to port the custom aggregates to this version... there goes my week". It would really help to have a small and simple example on how to do it (and no, that XMLagg monstrosity doesn't count)... something simple like a string concatenation function... with easy steps like: 1 - this is the interface you must implement 2 - look at this sample implementation (with nice comments that explain what is happening) 3 - this is how to build it 4 - this is how you register the function 5 - profit! Best regards. -- Luis Neves
Dear Luis Thank you for your feedback and appreciation. Indeed, the interface and documentation to UDFs can be improved, but part of the speed comes from being able to 'tap into the kernel'. It is an area where the bare system is opened up and expert knowledge is sometimes required. We will take your suggestions into account to extend the set of UDF patterns supported by recipes on how-to-s. regards, Martin On 2/21/13 2:29 PM, Luis Neves wrote:
Hello,
On 02/21/2013 11:43 AM, Sjoerd Mullender wrote:
Take a look at http://www.monetdb.org/content/internal-changes-feb2013-release for some information about what has changed. It doesn't help you directly, but it does give some background. You may be able to figure out what needs to be changed, or at least where to look.
I feel that this is an area that deserves to be improved in MonetDB. Is it really necessary to have a deep knowledge of the MonetDB internals in order to create a custom aggregate function? Can we have a "stable" interface to code against?
I have developed "custom aggregates" for PostgreSQL, MS SQL Server and H2. Please compare and contrast the simplicity of developing a custom aggregate in these databases:
PostgreSQL: http://www.postgresql.org/docs/9.2/static/sql-createaggregate.html
MS-SQL Server: http://technet.microsoft.com/en-us/library/ms131051%28v=sql.90%29.aspx
H2: http://www.h2database.com/javadoc/org/h2/api/AggregateFunction.html
The MonetDB equivalent is ... well... grep'ing through source code, searching the mailing lists, lots of trial and error, interpreting cryptic error messages (if we are lucky enough to have one), stepping trough debuggers and in general having a miserable experience.
Anyway... I don't want to sound ungrateful, MonetDB really is something amazing and you guys absolutely rock for offering the world the fruits of your intellect but please be aware that for most us mere mortals creating custom aggregates in MonetDB is severely non trivial.
I was reading trough the above blog post and thinking "cool!... very cool! ... wow! even faster!... amazing" and then hit me... "crap! I will have to port the custom aggregates to this version... there goes my week".
It would really help to have a small and simple example on how to do it (and no, that XMLagg monstrosity doesn't count)... something simple like a string concatenation function... with easy steps like:
1 - this is the interface you must implement 2 - look at this sample implementation (with nice comments that explain what is happening) 3 - this is how to build it 4 - this is how you register the function 5 - profit!
Best regards.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-02-20 19:52, Miguel Ping wrote:
Hi,
Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for:
AGGRsubxml AGGRsubxmlcand BATxmlaggr
Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs.
BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post. AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result. And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil). I hope this helps.
I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore.
That is correct. - -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQCVAwUBUSZAwD7g04AjvIQpAQLcgAQArpgb/GyG8vVpQtCam6Wjs1VfM7ZyHas7 mKagPyMMw/WAYqQKTcXOgYDm+TJH7LZ2KSlmbstH8NYFlDXqmcB/zZdSyOhMC2cX vstRu/zAO/wcTlzFK849T7t1+2oqQ7xZb0naMQiVqRwbMP00iMeEryM2T1H/AlRB XHQ0ddIU4oQ= =6S71 -----END PGP SIGNATURE-----
OK let me take some time to digest that, some of the terms you use are vaguely meaningful for me, but it sure helps to understand the code. Thanks for the explanation. On 02/21/2013 03:44 PM, Sjoerd Mullender wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for:
AGGRsubxml AGGRsubxmlcand BATxmlaggr
Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs. BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from
On 2013-02-20 19:52, Miguel Ping wrote: the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post.
AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result.
And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil).
I hope this helps.
I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore. That is correct.
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/
iQCVAwUBUSZAwD7g04AjvIQpAQLcgAQArpgb/GyG8vVpQtCam6Wjs1VfM7ZyHas7 mKagPyMMw/WAYqQKTcXOgYDm+TJH7LZ2KSlmbstH8NYFlDXqmcB/zZdSyOhMC2cX vstRu/zAO/wcTlzFK849T7t1+2oqQ7xZb0naMQiVqRwbMP00iMeEryM2T1H/AlRB XHQ0ddIU4oQ= =6S71 -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
I'm hitting a segfault. Can the following line ever return NULL for non-null 'g'? Can't understand how this happens. I'm grouping a table with only one element:
grps = (const oid *) Tloc(g, BUNfirst(g));
afterwards, accessing grps trhrows a segfault:
... prev = grps[0];//segfaults
+------+------+ | id | data | +======+======+ | a | 11 | +------+------+ sql>select id, hllagg(data) from test group by id; (BOOM!)segfaults. Thanks On 02/21/2013 03:51 PM, Miguel Ping wrote:
OK let me take some time to digest that, some of the terms you use are vaguely meaningful for me, but it sure helps to understand the code.
Thanks for the explanation.
On 02/21/2013 03:44 PM, Sjoerd Mullender wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for:
AGGRsubxml AGGRsubxmlcand BATxmlaggr
Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs. BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from
On 2013-02-20 19:52, Miguel Ping wrote: the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post.
AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result.
And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil).
I hope this helps.
I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore. That is correct.
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/
iQCVAwUBUSZAwD7g04AjvIQpAQLcgAQArpgb/GyG8vVpQtCam6Wjs1VfM7ZyHas7 mKagPyMMw/WAYqQKTcXOgYDm+TJH7LZ2KSlmbstH8NYFlDXqmcB/zZdSyOhMC2cX vstRu/zAO/wcTlzFK849T7t1+2oqQ7xZb0naMQiVqRwbMP00iMeEryM2T1H/AlRB XHQ0ddIU4oQ= =6S71 -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
On 2013-02-21 18:07, Miguel Ping wrote:
I'm hitting a segfault. Can the following line ever return NULL for non-null 'g'? Can't understand how this happens. I'm grouping a table with only one element:
grps = (const oid *) Tloc(g, BUNfirst(g));
Yes, that can return NULL. If g's tail column is TYPE_void, this will return NULL.
afterwards, accessing grps trhrows a segfault:
... prev = grps[0];//segfaults
+------+------+ | id | data | +======+======+ | a | 11 | +------+------+ sql>select id, hllagg(data) from test group by id; (BOOM!)segfaults.
Thanks
OK let me take some time to digest that, some of the terms you use are vaguely meaningful for me, but it sure helps to understand the code.
Thanks for the explanation.
On 02/21/2013 03:44 PM, Sjoerd Mullender wrote: On 2013-02-20 19:52, Miguel Ping wrote:
Hi,
Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for:
AGGRsubxml AGGRsubxmlcand BATxmlaggr
Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs. BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from
On 02/21/2013 03:51 PM, Miguel Ping wrote: the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post.
AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result.
And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil).
I hope this helps.
I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore. That is correct.
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
On 2013-02-21 21:49, Sjoerd Mullender wrote:
On 2013-02-21 18:07, Miguel Ping wrote:
I'm hitting a segfault. Can the following line ever return NULL for non-null 'g'? Can't understand how this happens. I'm grouping a table with only one element:
grps = (const oid *) Tloc(g, BUNfirst(g));
Yes, that can return NULL. If g's tail column is TYPE_void, this will return NULL.
See changeset f1f66b64f0bc for a fix (I hope).
afterwards, accessing grps trhrows a segfault:
... prev = grps[0];//segfaults
+------+------+ | id | data | +======+======+ | a | 11 | +------+------+ sql>select id, hllagg(data) from test group by id; (BOOM!)segfaults.
Thanks
OK let me take some time to digest that, some of the terms you use are vaguely meaningful for me, but it sure helps to understand the code.
Thanks for the explanation.
On 02/21/2013 03:44 PM, Sjoerd Mullender wrote: On 2013-02-20 19:52, Miguel Ping wrote:
Hi,
Did the UDF group by api changed? I just tried to upgrade my custom UDFs, getting an error about a 'sub':
> batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], r1_18:bat[:oid,:oid], _38:bit)
as usual, I checked the diffs in the xmlagg that comes with monet, there's some new functions that I can't really tell what they are for:
AGGRsubxml AGGRsubxmlcand BATxmlaggr
Can someone kindly explain the changes? It seems a breaking change, I couldn't find anything on the docs. BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from
On 02/21/2013 03:51 PM, Miguel Ping wrote: the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post.
AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result.
And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil).
I hope this helps.
I also removed the BATaccessBegin and BATaccessEnd macro calls, it seems they are not needed anymore. That is correct.
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
On 2013-02-21 22:01, Sjoerd Mullender wrote:
On 2013-02-21 21:49, Sjoerd Mullender wrote:
On 2013-02-21 18:07, Miguel Ping wrote:
I'm hitting a segfault. Can the following line ever return NULL for non-null 'g'? Can't understand how this happens. I'm grouping a table with only one element:
grps = (const oid *) Tloc(g, BUNfirst(g));
Yes, that can return NULL. If g's tail column is TYPE_void, this will return NULL.
See changeset f1f66b64f0bc for a fix (I hope).
Forget this. Thinko. What needs to happen in case of a dense-tailed g is just return b since the groups all contain a single row.
afterwards, accessing grps trhrows a segfault:
... prev = grps[0];//segfaults
+------+------+ | id | data | +======+======+ | a | 11 | +------+------+ sql>select id, hllagg(data) from test group by id; (BOOM!)segfaults.
Thanks
OK let me take some time to digest that, some of the terms you use are vaguely meaningful for me, but it sure helps to understand the code.
Thanks for the explanation.
On 02/21/2013 03:44 PM, Sjoerd Mullender wrote: On 2013-02-20 19:52, Miguel Ping wrote:
> Hi, > > Did the UDF group by api changed? I just tried to upgrade my > custom UDFs, getting an error about a 'sub': > >> batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := > batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], > r1_18:bat[:oid,:oid], _38:bit) > > as usual, I checked the diffs in the xmlagg that comes with monet, > there's some new functions that I can't really tell what they are > for: > > AGGRsubxml AGGRsubxmlcand BATxmlaggr > > Can someone kindly explain the changes? It seems a breaking change, > I couldn't find anything on the docs. BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from
On 02/21/2013 03:51 PM, Miguel Ping wrote: the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post.
AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result.
And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil).
I hope this helps.
> I also removed the BATaccessBegin and BATaccessEnd macro calls, it > seems they are not needed anymore. That is correct.
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
Just applied your patch, and it worksforme. Thanks! On 02/21/2013 09:19 PM, Sjoerd Mullender wrote:
On 2013-02-21 21:49, Sjoerd Mullender wrote:
On 2013-02-21 18:07, Miguel Ping wrote:
I'm hitting a segfault. Can the following line ever return NULL for non-null 'g'? Can't understand how this happens. I'm grouping a table with only one element:
grps = (const oid *) Tloc(g, BUNfirst(g)); Yes, that can return NULL. If g's tail column is TYPE_void, this will return NULL. See changeset f1f66b64f0bc for a fix (I hope). Forget this. Thinko. What needs to happen in case of a dense-tailed g is just return b since
On 2013-02-21 22:01, Sjoerd Mullender wrote: the groups all contain a single row.
afterwards, accessing grps trhrows a segfault:
... prev = grps[0];//segfaults
+------+------+ | id | data | +======+======+ | a | 11 | +------+------+ sql>select id, hllagg(data) from test group by id; (BOOM!)segfaults.
Thanks
OK let me take some time to digest that, some of the terms you use are vaguely meaningful for me, but it sure helps to understand the code.
Thanks for the explanation.
On 02/21/2013 03:44 PM, Sjoerd Mullender wrote: On 2013-02-20 19:52, Miguel Ping wrote:
>> Hi, >> >> Did the UDF group by api changed? I just tried to upgrade my >> custom UDFs, getting an error about a 'sub': >> >>> batudf.*sub*hllagg' undefined in: _37:bat[:any,:str] := >> batudf.subhllagg(_34:bat[:oid,:str], _18:bat[:oid,:oid], >> r1_18:bat[:oid,:oid], _38:bit) >> >> as usual, I checked the diffs in the xmlagg that comes with monet, >> there's some new functions that I can't really tell what they are >> for: >> >> AGGRsubxml AGGRsubxmlcand BATxmlaggr >> >> Can someone kindly explain the changes? It seems a breaking change, >> I couldn't find anything on the docs. BATxmlaggr does the real work. As you can see, it's a static function, so not used by any other code. AGGRsubxml and AGGRsubxmlcand are the functions that are called from
On 02/21/2013 03:51 PM, Miguel Ping wrote: the MAL level. As you can see, the former calls the latter with a NULL argument for the sid parameter. The sid parameter represents the optional candidates list that I referred to in my blog post.
AGGRsubxmlcand just converts the bat pointers to BAT pointers and calls BATxmlaggr to do the real work. Afterwards it cleans up and returns the result.
And now what is the real work? b is the bat being aggregated. It is dense headed, and in the case of this function, the tail is the XML values. g is the groups BAT. Again, it is dense headed and aligned with b. The tail is oid and it indicates the group each element b is a member of. Equal values in g means the same group. If g is NULL, there are no groups, or rather, all values are in the same group, and e is not used. e is the extents BAT. The head is again dense, but it is not aligned with b and g. Instead, it contains the range of group ids that are used in g. The tail is not used. If e can be NULL. It just makes the code less efficient. s is the candidates list and can be NULL. If it is specified, it is, again, dense-headed. The tail is oid and sorted in ascending order and without duplicates. All values must be in the range of the head columns of b and g. It indicates the values that participate. If s is not specified, all values in b/g participate, otherwise only the values mentioned in s. Finally, skip_nils says what to do with nil values in the tail of b: ignore (do as if the value isn't there), or take along in the aggregation (with most aggregations that would mean the result would be nil).
I hope this helps.
>> I also removed the BATaccessBegin and BATaccessEnd macro calls, it >> seems they are not needed anymore. That is correct.
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (5)
-
Luis Neves
-
Martin Kersten
-
Miguel Ping
-
Sjoerd Mullender
-
Sjoerd Mullender