[Monetdb-developers] MAPI cache bug: explained

5 Jun 2008

      Hi,

Already two days I am very annoyed with the MAPI interface, having 
'random' fallout in mapi_fetch_field. With high confidence of trust in 
my own code I decided to walk the road of debugging the Mapiclient 
library and I'm reporting here.

Please point your editors at clients/src/mapilib/Mapi.mx

Around line 4800 you will find the function mapi_fetch_row. This is a 
public function and gets borked randomly but there seems to be a 
correlation between the amount of results fetched, lets move on.

I was able to find the source of the problem in the while loop, where 
mapi_fetch_line is accessed. This function is the first place this 
function can return 0. This seems to be consistent with my logs.

So what is this mapi_fetch_line all about? Around line 4260, you can 
find it. The idea of this function would be not to return NULL. After 
some debugging it seems that the function on the 'bug' occasion always 
returns 0. Why?

If we explore the if statement something interesting can be found:

mapi_fetch_line bypass: 709 + 91 < 800 = 0
result->cache.first + result->cache.tuplecount < result->row_count

That seems to make sense, but it also prevents the mapi_fetch_line 
function to be processed. So what is not doing its job? The logical 
answer would be the function above mapi_fetch_line_internal.

In this cute function there is an interesting check:

if (mid->active != hdl || hdl->needmore)

I was a bit distracted here, but when I explicitly have set a check on 
both constraints it seems the mid->active != hdl is the failure here. 
And I guessed it would be possible that mid->active (or actually 
hdl->active) was in fact equal to NULL.

When I read about the MapiHdl active;
MapiHdl active;         /* set when not all rows have been received */

Isn't it strange that needmore is actually 0, but we would still require 
an active handle? I thought this would suggest the replacement of || 
with &&, but even with the assert in read_into_cache removed, this comes 
out with even MORE failed data.

I hope I have outlined the problem in sufficient details so that one of 
the core developers can look where possible failures of hdl->mid->active 
takes place. If someone wants to see the live failures he is welcome to 
ask a private login.

Yours Sincerely,

Stefan de Konink

Stefan de Konink

Sjoerd Mullender

tags

participants (2)