[Monetdb-developers] MapiClient asynchronous interface?
Hi all, I'm using MonetDB as backend for an asynchronous server, and would like to stick to this concurrency model. I'm aware of the TCP/IP module, which supports async io, but as far as I understood it is made for interaction between MonetDB servers. Could someone make an estimation about the effort required to change MapiClient to perform asynchronous operations vs. making use of the TCP/IP module ("faking" another monetdb instance) from a client? My impression from a shallow look at the code is, that the TCP/IP module transfers bats in binary format, whereas mapi-module/MapiClient have to format/parse strings, is that correct? And if, does maybe someone have an idea about the overhead that the string handling of the mapi-interface introduces? Thanks a lot, Johann
Hi Johann, I don't really understand what you mean by "asynchronous" in the sense of Mapi(Client). Can you give an example of what you ideally would like to do? On 18-11-2006 15:47:47 +0100, Johann Borck wrote:
Hi all,
I'm using MonetDB as backend for an asynchronous server, and would like to stick to this concurrency model. I'm aware of the TCP/IP module, which supports async io, but as far as I understood it is made for interaction between MonetDB servers. Could someone make an estimation about the effort required to change MapiClient to perform asynchronous operations vs. making use of the TCP/IP module ("faking" another monetdb instance) from a client? My impression from a shallow look at the code is, that the TCP/IP module transfers bats in binary format, whereas mapi-module/MapiClient have to format/parse strings, is that correct? And if, does maybe someone have an idea about the overhead that the string handling of the mapi-interface introduces?
Thanks a lot, Johann
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Fabian Groffen wrote:
Hi Johann,
I don't really understand what you mean by "asynchronous" in the sense of Mapi(Client). Can you give an example of what you ideally would like to do?
I'd like to use nonblocking sockets, so that I can send a request to MonetDB, and the call returns immediately. When monet sends data, the client is notified (by poll/select or similar) and processes data as it arrives. Because this is the core-functionality of my webserver anyway I'd like to prevent it from blocking on a recv/read on the socket, because that would force me to use dedicated threads for db-connections.
On 18-11-2006 15:47:47 +0100, Johann Borck wrote:
Hi all,
I'm using MonetDB as backend for an asynchronous server, and would like to stick to this concurrency model. I'm aware of the TCP/IP module, which supports async io, but as far as I understood it is made for interaction between MonetDB servers. Could someone make an estimation about the effort required to change MapiClient to perform asynchronous operations vs. making use of the TCP/IP module ("faking" another monetdb instance) from a client? My impression from a shallow look at the code is, that the TCP/IP module transfers bats in binary format, whereas mapi-module/MapiClient have to format/parse strings, is that correct? And if, does maybe someone have an idea about the overhead that the string handling of the mapi-interface introduces?
Thanks a lot, Johann
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
On 18-11-2006 16:11:04 +0100, Johann Borck wrote:
Fabian Groffen wrote:
Hi Johann,
I don't really understand what you mean by "asynchronous" in the sense of Mapi(Client). Can you give an example of what you ideally would like to do?
I'd like to use nonblocking sockets, so that I can send a request to MonetDB, and the call returns immediately. When monet sends data, the client is notified (by poll/select or similar) and processes data as it arrives. Because this is the core-functionality of my webserver anyway I'd like to prevent it from blocking on a recv/read on the socket, because that would force me to use dedicated threads for db-connections.
Ah, I see. No, this is not possible. The design of the whole protocol is based on answer/response rituals that are serial. This is also backing up transactions, hence will not change. The conventional way of solving this is by using threads for connections (e.g. connection pooling). If you don't want to go that route, there is not much we can do.
Hi Fabian, Johann I can see the need for an asynchronous call. It works for applications that that don't care about the outcome of a transaction. It also seems relatively easy to implement in the Mapi server side. You have to trap the async query call and handle it in the context of a new server thread that discards the output (>/dev/null). The 'asynquery' can immediately return to the client with an ACK and terminate the session. Fabian Groffen wrote:
On 18-11-2006 16:11:04 +0100, Johann Borck wrote:
Fabian Groffen wrote:
Hi Johann,
I don't really understand what you mean by "asynchronous" in the sense of Mapi(Client). Can you give an example of what you ideally would like to do?
I'd like to use nonblocking sockets, so that I can send a request to MonetDB, and the call returns immediately. When monet sends data, the client is notified (by poll/select or similar) and processes data as it arrives. Because this is the core-functionality of my webserver anyway I'd like to prevent it from blocking on a recv/read on the socket, because that would force me to use dedicated threads for db-connections.
Ah, I see. No, this is not possible. The design of the whole protocol is based on answer/response rituals that are serial. This is also backing up transactions, hence will not change. The conventional way of solving this is by using threads for connections (e.g. connection pooling). If you don't want to go that route, there is not much we can do.
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
On 18-11-2006 16:22:48 +0100, Martin Kersten wrote:
Hi Fabian, Johann
I can see the need for an asynchronous call. It works for applications that that don't care about the outcome of a transaction. It also seems relatively easy to implement in the Mapi server side. You have to trap the async query call and handle it in the context of a new server thread that discards the output (>/dev/null). The 'asynquery' can immediately return to the client with an ACK and terminate the session.
The what are the effects of a long running query? I don't see any advantage here. You can just pool between sockets with one "main" thread, and try to read every once in a while by checking if there is data available to be read in the (TCP) buffers. This is, however, with many connections a deadly situation. Since Johann is working with a webserver, it sounds quite weird not to use threads in the first place to me anyway.
Fabian Groffen wrote:
On 18-11-2006 16:22:48 +0100, Martin Kersten wrote:
Hi Fabian, Johann
I can see the need for an asynchronous call. It works for applications that that don't care about the outcome of a transaction. It also seems relatively easy to implement in the Mapi server side. You have to trap the async query call and handle it in the context of a new server thread that discards the output (>/dev/null). The 'asynquery' can immediately return to the client with an ACK and terminate the session.
The what are the effects of a long running query? I don't see any advantage here. You can just pool between sockets with one "main" thread, and try to read every once in a while by checking if there is data available to be read in the (TCP) buffers. This is, however, with many connections a deadly situation.
Since Johann is working with a webserver, it sounds quite weird not to use threads in the first place to me anyway.
Well, that's a big discussion :) but using an event-driven state machine with single- or few-threaded servers is not really an uncommon thing, and has big advantages, e.g. the avoidance of context-switches. Lighttpd is a prominent example, which performs extremely well compared to, say, Apache. For really big amounts of concurrent connections the one-thread (or even one-process)-per-connection model really is a problem. Using epoll (or the new kevent) one can deal with thousands of simultaneous connections without problems, especially without having to waste lots of memory (one object/struct vs one thread) and cpu-time (context-switches + dealing with locking issues). When I have some time, I'll very likely give it a try in case of the MapiClient. Changing the io-model should not affect transactions, the asynchronous version will of course have to follow the same protocol that the synchronous version does. I just thought maybe monetdb supports this already, because in the TCP/IP module there are asynchronous methods. Anyway, thanks for this great product and regards, Johann
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
On 18-11-2006 17:31:52 +0100, Johann Borck wrote:
The what are the effects of a long running query? I don't see any advantage here. You can just pool between sockets with one "main" thread, and try to read every once in a while by checking if there is data available to be read in the (TCP) buffers. This is, however, with many connections a deadly situation.
Since Johann is working with a webserver, it sounds quite weird not to use threads in the first place to me anyway.
Well, that's a big discussion :) but using an event-driven state machine with single- or few-threaded servers is not really an uncommon thing, and has big advantages, e.g. the avoidance of context-switches. Lighttpd is a prominent example, which performs extremely well compared to, say, Apache. For really big amounts of concurrent connections the one-thread (or even one-process)-per-connection model really is a problem. Using epoll (or the new kevent) one can deal with thousands of simultaneous connections without problems, especially without having to waste lots of memory (one object/struct vs one thread) and cpu-time (context-switches + dealing with locking issues).
Ok, but in this situation, it looks to me as if you would benefit from a "real" embedded version of MonetDB, where you just talk to a library instead, omitting all costs of Mapi/TCP and simply stick to method calls.
When I have some time, I'll very likely give it a try in case of the MapiClient. Changing the io-model should not affect transactions, the asynchronous version will of course have to follow the same protocol that the synchronous version does. I just thought maybe monetdb supports this already, because in the TCP/IP module there are asynchronous methods.
Is asynchronous equal to non-blocking IO in your case? The latter one should be too hard, I suppose...
Fabian Groffen wrote:
Ok, but in this situation, it looks to me as if you would benefit from a "real" embedded version of MonetDB, where you just talk to a library instead, omitting all costs of Mapi/TCP and simply stick to method calls.
Is this what's mentioned in the docs/examples as embedded mode, or even lower level? I thought about using it embedded, but what scared me a bit is that locking is turned of completely when using it this way, and so big transactions could make smaller ones wait. OTOH real big transactions are not very probable for my usecase and the embedded version is probably very very fast :).
Is asynchronous equal to non-blocking IO in your case? The latter one should be too hard, I suppose...
Yep, it implies non-blocking IO. All right.. I think first I should see how the embedded stuff works out. thanks for the suggestion, Johann
Johann Borck wrote:
Fabian Groffen wrote:
Ok, but in this situation, it looks to me as if you would benefit from a "real" embedded version of MonetDB, where you just talk to a library instead, omitting all costs of Mapi/TCP and simply stick to method calls.
Is this what's mentioned in the docs/examples as embedded mode, or even lower level? I thought about using it embedded, but what scared me a bit is that locking is turned of completely when using it this way, and so big transactions could make smaller ones wait. OTOH real big transactions are not very probable for my usecase and the embedded version is probably very very fast :). Indeed the embedded version was originally carved out to run on a SBC Linux board of an MP3 player. Transactions came from a single user. Concurrency control overhead could raise to >30% of all cpu cycles.
If your usecase has the option to differentiate short/long transactions then you might consider a queuing scheme, postponing the long ones. Or, alternatively, break the long one semantically into independent pieces. If your database fits in memory, I would definitely look at serial execution as the way to go and counter overload by an admission policy. Martin
Is asynchronous equal to non-blocking IO in your case? The latter one should be too hard, I suppose...
Yep, it implies non-blocking IO. All right.. I think first I should see how the embedded stuff works out. thanks for the suggestion, Johann
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
On 18-11-2006 19:14:31 +0100, Johann Borck wrote:
Fabian Groffen wrote:
Ok, but in this situation, it looks to me as if you would benefit from a "real" embedded version of MonetDB, where you just talk to a library instead, omitting all costs of Mapi/TCP and simply stick to method calls.
Is this what's mentioned in the docs/examples as embedded mode, or even lower level? I thought about using it embedded, but what scared me a bit is that locking is turned of completely when using it this way, and so big transactions could make smaller ones wait. OTOH real big transactions are not very probable for my usecase and the embedded version is probably very very fast :).
Unfortunately the current "embedded" MonetDB is nothing but a "limited-to-one-connection" server, thus avoiding the need for locks. It still requires Mapi TCP connections and a running daemonised server, hence does require context switches and TCP send/receive actions.
Fabian Groffen wrote:
On 18-11-2006 19:14:31 +0100, Johann Borck wrote:
Fabian Groffen wrote:
Unfortunately the current "embedded" MonetDB is nothing but a "limited-to-one-connection" server, thus avoiding the need for locks. It still requires Mapi TCP connections and a running daemonised server, hence does require context switches and TCP send/receive actions. But this overhead can be removed relatively easy if you are willing to spent some time in patching Mapi.mx. I guess it would cost just a day or two to get rid of the TCP part. I would still use a two-threaded embedded process structure, because this does not require protocol changes. Such a compile-time version could improve the embedded version significantly.
Of course, you can gain much more if you directly call SQL/XQRY/MIL/MAL procs and deal with their results as well. But, this is paid in weeks/months and most likely application specific.
Fabian Groffen wrote:
On 18-11-2006 16:11:04 +0100, Johann Borck wrote:
Fabian Groffen wrote:
Hi Johann,
I don't really understand what you mean by "asynchronous" in the sense of Mapi(Client). Can you give an example of what you ideally would like to do?
I'd like to use nonblocking sockets, so that I can send a request to MonetDB, and the call returns immediately. When monet sends data, the client is notified (by poll/select or similar) and processes data as it arrives. Because this is the core-functionality of my webserver anyway I'd like to prevent it from blocking on a recv/read on the socket, because that would force me to use dedicated threads for db-connections.
Ah, I see. No, this is not possible. The design of the whole protocol is based on answer/response rituals that are serial. This is also backing up transactions, hence will not change. The conventional way of solving this is by using threads for connections (e.g. connection pooling). If you don't want to go that route, there is not much we can do.
Ok, so for now I'll have no choice. thanks for the quick response, Johann
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
participants (3)
-
Fabian Groffen
-
Johann Borck
-
Martin Kersten