[Monetdb-developers] MAPI cache bug: explained
Hi, Already two days I am very annoyed with the MAPI interface, having 'random' fallout in mapi_fetch_field. With high confidence of trust in my own code I decided to walk the road of debugging the Mapiclient library and I'm reporting here. Please point your editors at clients/src/mapilib/Mapi.mx Around line 4800 you will find the function mapi_fetch_row. This is a public function and gets borked randomly but there seems to be a correlation between the amount of results fetched, lets move on. I was able to find the source of the problem in the while loop, where mapi_fetch_line is accessed. This function is the first place this function can return 0. This seems to be consistent with my logs. So what is this mapi_fetch_line all about? Around line 4260, you can find it. The idea of this function would be not to return NULL. After some debugging it seems that the function on the 'bug' occasion always returns 0. Why? If we explore the if statement something interesting can be found: mapi_fetch_line bypass: 709 + 91 < 800 = 0 result->cache.first + result->cache.tuplecount < result->row_count That seems to make sense, but it also prevents the mapi_fetch_line function to be processed. So what is not doing its job? The logical answer would be the function above mapi_fetch_line_internal. In this cute function there is an interesting check: if (mid->active != hdl || hdl->needmore) I was a bit distracted here, but when I explicitly have set a check on both constraints it seems the mid->active != hdl is the failure here. And I guessed it would be possible that mid->active (or actually hdl->active) was in fact equal to NULL. When I read about the MapiHdl active; MapiHdl active; /* set when not all rows have been received */ Isn't it strange that needmore is actually 0, but we would still require an active handle? I thought this would suggest the replacement of || with &&, but even with the assert in read_into_cache removed, this comes out with even MORE failed data. I hope I have outlined the problem in sufficient details so that one of the core developers can look where possible failures of hdl->mid->active takes place. If someone wants to see the live failures he is welcome to ask a private login. Yours Sincerely, Stefan de Konink
Stefan de Konink wrote:
Hi,
Already two days I am very annoyed with the MAPI interface, having 'random' fallout in mapi_fetch_field. With high confidence of trust in my own code I decided to walk the road of debugging the Mapiclient library and I'm reporting here.
Please point your editors at clients/src/mapilib/Mapi.mx
Around line 4800 you will find the function mapi_fetch_row. This is a public function and gets borked randomly but there seems to be a correlation between the amount of results fetched, lets move on.
I was able to find the source of the problem in the while loop, where mapi_fetch_line is accessed. This function is the first place this function can return 0. This seems to be consistent with my logs.
So what is this mapi_fetch_line all about? Around line 4260, you can find it. The idea of this function would be not to return NULL. After some debugging it seems that the function on the 'bug' occasion always returns 0. Why?
If we explore the if statement something interesting can be found:
mapi_fetch_line bypass: 709 + 91 < 800 = 0 result->cache.first + result->cache.tuplecount < result->row_count
That seems to make sense, but it also prevents the mapi_fetch_line function to be processed. So what is not doing its job? The logical answer would be the function above mapi_fetch_line_internal.
In this cute function there is an interesting check:
if (mid->active != hdl || hdl->needmore)
I was a bit distracted here, but when I explicitly have set a check on both constraints it seems the mid->active != hdl is the failure here. And I guessed it would be possible that mid->active (or actually hdl->active) was in fact equal to NULL.
When I read about the MapiHdl active; MapiHdl active; /* set when not all rows have been received */
Isn't it strange that needmore is actually 0, but we would still require an active handle? I thought this would suggest the replacement of || with &&, but even with the assert in read_into_cache removed, this comes out with even MORE failed data.
I think needmore is a bit of a red herring. It is set when the server needs more data from the client, i.e. when an incomplete statement was sent to the server and the server wants the rest of the statement. This will happen in multiline statements in mclient, but in "normal" applications that send complete statements in one go that should never happen. mid->active is set when data is sent to the server to indicate for which handle we expect an answer from the server. It is cleared when the server indicates there is no more data (PROPMTBEG case in read_into_cache). In any case, that is the theory.
I hope I have outlined the problem in sufficient details so that one of the core developers can look where possible failures of hdl->mid->active takes place. If someone wants to see the live failures he is welcome to ask a private login.
It would help if you could give a scenario under which things don't work so that we can try to reproduce. Can you share your code (or at least the relevant bit) with us? This can also be done off list. Also, it would be helpful if you could submit this as a bug report. That makes tracking it easier. -- Sjoerd Mullender
participants (2)
-
Sjoerd Mullender
-
Stefan de Konink