potential data contention in _list_find_name() during BAT binding?
Hi, I'm reading the source code of MonetDB and I have a question about multi-threaded BAT binding and would appreciate it if someone could help to answer. Here is my question: When a sql_bind() MAL instruction is called, I see we have a callstack something like: mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> _cs_find_name->_list_find_name In _list_find_name, we have something like: if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) { l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)&base_key); for (n = l->h; n; n = n->next ) { sql_base *b = n->data; int key = base_key(b); hash_add(l->ht, key, b); } } if (l && l->ht) { int key = hash_key(name); sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)]; for (; he; he = he->chain) { sql_base *b = he->value; if (b->name && strcmp(b->name, name) == 0) return b; } return NULL; } if (l) for (n = l->h; n; n = n->next) { sql_base *b = n->data; /* check if names match */ if (name[0] == b->name[0] && strcmp(name, b->name) == 0) { return b; } } Looks like the function is trying to create a hash table for the list if called the first time. However, there is no protection against data contention. Now suppose the mitosis optimizer is turned on and we could have something like: X_157:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",0,8); X_159:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",1,8); X_161:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",2,8); X_163:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",3,8); X_165:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",4,8); X_167:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",5,8); X_169:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",6,8); X_171:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",7,8); If my understanding is correct, since these MAL instructions has no dependency in each other, they could potentially be executed in different threads simultaneously. How would MonetDB prevent data contention in _list_find_name()? Particularly the case that if thread A has created a hash table and has not yet inserted the elements while thread B tries to search through the list and thinks the hash table is ready and starts to use it. It will fail to return the correct column since the hash table is empty, right? Thanks, Mengmeng
On Mon, Aug 12, 2013 at 03:07:47PM -0700, Mengmeng Chen wrote:
Hi,
I'm reading the source code of MonetDB and I have a question about multi-threaded BAT binding and would appreciate it if someone could help to answer.
Here is my question:
When a sql_bind() MAL instruction is called, I see we have a callstack something like:
mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> _cs_find_name->_list_find_name
In _list_find_name, we have something like:
if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) { l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)&base_key);
for (n = l->h; n; n = n->next ) { sql_base *b = n->data; int key = base_key(b);
hash_add(l->ht, key, b); } } if (l && l->ht) { int key = hash_key(name); sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)];
for (; he; he = he->chain) { sql_base *b = he->value;
if (b->name && strcmp(b->name, name) == 0) return b; } return NULL; }
if (l) for (n = l->h; n; n = n->next) { sql_base *b = n->data;
/* check if names match */ if (name[0] == b->name[0] && strcmp(name, b->name) == 0) { return b; } }
Looks like the function is trying to create a hash table for the list if called the first time. However, there is no protection against data contention. Now suppose the mitosis optimizer is turned on and we could have something like:
X_157:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",0,8); X_159:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",1,8); X_161:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",2,8); X_163:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",3,8); X_165:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",4,8); X_167:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",5,8); X_169:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",6,8); X_171:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",7,8);
If my understanding is correct, since these MAL instructions has no dependency in each other, they could potentially be executed in different threads simultaneously. How would MonetDB prevent data contention in _list_find_name()? Particularly the case that if thread A has created a hash table and has not yet inserted the elements while thread B tries to search through the list and thinks the hash table is ready and starts to use it. It will fail to return the correct column since the hash table is empty, right?
If the hash is created during queriing, your conclusion is right. But that isn't the case, the hashtable gets created during table create, table loading (server startup) and plan creation. Al these aren't concurrent. Niels
Thanks, Mengmeng
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Hi Niels,
Thank you for the answer.
How about remote table access, say in my example if `lineitem` is defined
as a remote table?
In my own debugging, it seems that when the remote function is running on
the remote site, the 'loading' of the table is happening at run time and
could be multithreaded.
Thanks,
Mengmeng
2013/8/12 Niels Nes
On Mon, Aug 12, 2013 at 03:07:47PM -0700, Mengmeng Chen wrote:
Hi,
I'm reading the source code of MonetDB and I have a question about multi-threaded BAT binding and would appreciate it if someone could help to answer.
Here is my question:
When a sql_bind() MAL instruction is called, I see we have a callstack something like:
mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> _cs_find_name->_list_find_name
In _list_find_name, we have something like:
if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) { l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)&base_key);
for (n = l->h; n; n = n->next ) { sql_base *b = n->data; int key = base_key(b);
hash_add(l->ht, key, b); } } if (l && l->ht) { int key = hash_key(name); sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)];
for (; he; he = he->chain) { sql_base *b = he->value;
if (b->name && strcmp(b->name, name) == 0) return b; } return NULL; }
if (l) for (n = l->h; n; n = n->next) { sql_base *b = n->data;
/* check if names match */ if (name[0] == b->name[0] && strcmp(name, b->name) == 0) { return b; } }
Looks like the function is trying to create a hash table for the list if called the first time. However, there is no protection against data contention. Now suppose the mitosis optimizer is turned on and we could have something like:
X_157:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",0,8); X_159:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",1,8); X_161:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",2,8); X_163:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",3,8); X_165:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",4,8); X_167:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",5,8); X_169:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",6,8); X_171:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",7,8);
If my understanding is correct, since these MAL instructions has no dependency in each other, they could potentially be executed in different threads simultaneously. How would MonetDB prevent data contention in _list_find_name()? Particularly the case that if thread A has created a hash table and has not yet inserted the elements while thread B tries to search through the list and thinks the hash table is ready and starts to use it. It will fail to return the correct column since the hash table is empty, right?
If the hash is created during queriing, your conclusion is right. But that isn't the case, the hashtable gets created during table create, table loading (server startup) and plan creation. Al these aren't concurrent.
Niels
Thanks, Mengmeng
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hi Niels,
Thank you for the answer.
How about remote table access, say in my example if `lineitem` is defined as a remote table? In my own debugging, it seems that when the remote function is running on the remote site, the 'loading' of the table is happening at run time and could be multithreaded. Here you may have a point. The query is compiled at the local site, ie
On Tue, Aug 13, 2013 at 10:54:23AM -0700, Mengmeng Chen wrote: the hash gets created during the first query execution. This could be a problem. Seems I have some fixing to do. Niels
Thanks, Mengmeng
2013/8/12 Niels Nes
On Mon, Aug 12, 2013 at 03:07:47PM -0700, Mengmeng Chen wrote: > Hi, > > I'm reading the source code of MonetDB and I have a question about > multi-threaded BAT binding and would appreciate it if someone could > help to answer. > > Here is my question: > > When a sql_bind() MAL instruction is called, I see we have a callstack > something like: > > mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> > _cs_find_name->_list_find_name > > In _list_find_name, we have something like: > > if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) { > l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)& base_key); > > for (n = l->h; n; n = n->next ) { > sql_base *b = n->data; > int key = base_key(b); > > hash_add(l->ht, key, b); > } > } > if (l && l->ht) { > int key = hash_key(name); > sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)]; > > for (; he; he = he->chain) { > sql_base *b = he->value; > > if (b->name && strcmp(b->name, name) == 0) > return b; > } > return NULL; > } > > if (l) > for (n = l->h; n; n = n->next) { > sql_base *b = n->data; > > /* check if names match */ > if (name[0] == b->name[0] && strcmp(name, b->name) == 0) { > return b; > } > } > > Looks like the function is trying to create a hash table for the list > if called the first time. However, there is no protection against data > contention. Now suppose the mitosis optimizer is turned on and we could > have something like: > > X_157:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",0,8); > X_159:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",1,8); > X_161:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",2,8); > X_163:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",3,8); > X_165:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",4,8); > X_167:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",5,8); > X_169:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",6,8); > X_171:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",7,8); > > If my understanding is correct, since these MAL instructions has no > dependency in each other, they could potentially be executed in > different threads simultaneously. How would MonetDB prevent data > contention in _list_find_name()? Particularly the case that if thread > A has created a hash table and has not yet inserted the elements while > thread B tries to search through the list and thinks the hash table is > ready and starts to use it. It will fail to return the correct column > since the hash table is empty, right? > If the hash is created during queriing, your conclusion is right. But that isn't the case, the hashtable gets created during table create, table loading (server startup) and plan creation. Al these aren't concurrent.
Niels > > > Thanks, > Mengmeng
> _______________________________________________ > users-list mailing list > users-list@monetdb.org > http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (2)
-
Mengmeng Chen
-
Niels Nes