[Monetdb-developers] MAPI protocol suggestion
Hi, With light on my recent philosophical question on doubles/uints for the lossy storage on coordinates. I wonder the following; Would it be possible to implement a binary MAPI protocol that can use the native types of the data that is stored within MonetDB? So for example; I create a table: CREATE TABLE helloword (hello varchar(13), world integer); INSERT INTO helloword (hello, world) VALUES ('hoi', 1); And fetch this table: SELECT hello, world FROM helloworld; And receive the MAPI data in the following (too simplistic) way: (len) (string) --> 5 hello 5 world --> 3 hoi 4 0x0 0x0 0x0 0x1 For several reasons the above will not work; the length field is too short (only up to 255) it is also not optimal. I wonder if it can trigger any idea to allow the end user to use native types instead of char[]'s. Stefan
LS, The question about binary versions of MAPI has recently been raised and a test was performed based on sending a large sequence of small SQL queries involving integers and strings. On 21-10-2008 15:47:02 +0200, Fabian Groffen wrote:
100000 "select 1" using 1 connection over standard client utility: psql: 9.2s (2.8 user, 1.1 sys) mysql: 5.4s (1.1 user, 0.9 sys) monetdb: 7.6s (2.1 user, 1.3 sys)
more to follow.
10000 "select 'hello world'" psql: 9.2s (2.9 user, 1.2 sys) mysql: 5.4s (1.2 user, 0.8 sys) monetdb: 7.9s (2.1 user, 1.2 sys) A Valgrind analysis of MonetDB showed that the cost for the protocol is around 10% of this task. A little gain in parsing might be expected when the protocol carries only numerics and from the formatting/parsing overhead of tuples being transmitted (result-sets are mostly small) Consequently, moving to a binary protocol is not warranted by comparison, nor effective in terms of performance improvement. 50% of the cost in MonetDB for these queries are the SQL parser and query cache matching. Given the focus of MonetDB on datawarehousing, rather then high volume web-interactions, there has been no steps taken to improve the protocol. There are other major gains the made. Stefan de Konink wrote:
Hi,
With light on my recent philosophical question on doubles/uints for the lossy storage on coordinates. I wonder the following;
Would it be possible to implement a binary MAPI protocol that can use the native types of the data that is stored within MonetDB?
So for example; I create a table:
CREATE TABLE helloword (hello varchar(13), world integer); INSERT INTO helloword (hello, world) VALUES ('hoi', 1);
And fetch this table:
SELECT hello, world FROM helloworld;
And receive the MAPI data in the following (too simplistic) way: (len) (string) --> 5 hello 5 world --> 3 hoi 4 0x0 0x0 0x0 0x1
For several reasons the above will not work; the length field is too short (only up to 255) it is also not optimal. I wonder if it can trigger any idea to allow the end user to use native types instead of char[]'s.
Stefan
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Martin Kersten wrote:
Consequently, moving to a binary protocol is not warranted by comparison, nor effective in terms of performance improvement. 50% of the cost in MonetDB for these queries are the SQL parser and query cache matching.
Given the focus of MonetDB on datawarehousing, rather then high volume web-interactions, there has been no steps taken to improve the protocol.
You only mention here the performance gain within MonetDB, as opposed to the performance gain that could be generated if the client doesn't need to parse integers from strings to work on them but only make an in memory reference to the current row, and threating it as a c-struct. Stefan
Martin Kersten wrote:
Consequently, moving to a binary protocol is not warranted by comparison, nor effective in terms of performance improvement. 50% of the cost in MonetDB for these queries are the SQL parser and query cache matching.
Given the focus of MonetDB on datawarehousing, rather then high volume web-interactions, there has been no steps taken to improve the protocol.
You only mention here the performance gain within MonetDB, as opposed to the performance gain that could be generated if the client doesn't need to parse integers from strings to work on them but only make an in memory reference to the current row, and threating it as a c-struct. Indeed. Even in an application it becomes relevant to know how much time is actually spent at the interface. My conjuncture is
Stefan de Konink wrote: that it is a few percent only, contrived cases excluded. Fastly outnumbered by even the network delays. Measuring applications and studying there behavior using e.g. Callgrind gives you a reasonable insight in the performance bottlenecks. Eg. in your recent callgrind file there was about a 10% cpu cycles loss due to virtualization it seemed.
Stefan
Martin Kersten wrote:
Indeed. Even in an application it becomes relevant to know how much time is actually spent at the interface. My conjuncture is that it is a few percent only, contrived cases excluded. Fastly outnumbered by even the network delays.
I can totally agree with the last statement. We are indeed talking in milliseconds. But personally, if I could make a C program with some structures that match my queries output, that would *really* make my life as programmer in object oriented fashion much better. I love to outline the pro's and con's especially related to pretty programming, but you have a point. This will only prevent a few parsing/formatting calls on Monetdb side, on Mapiclient side the same, and in best case it will reduce the amount of function calls done in the application to acquire individual column results.
Measuring applications and studying there behavior using e.g. Callgrind gives you a reasonable insight in the performance bottlenecks. Eg. in your recent callgrind file there was about a 10% cpu cycles loss due to virtualization it seemed.
I think it is impossible to measure the cycles lost in virtualisation, from within the VM. Or you must be able to see more context switches? Never the less, debugging, asserting and the VM just allows me to share my work and hopefully improve it without direct risk of breaking things. Not on the performance side :) (Though it is nice to see that even with all debugging options enabled the system still rocks in performance.) Stefan
participants (2)
-
Martin Kersten
-
Stefan de Konink