On Mon, Aug 12, 2013 at 03:07:47PM -0700, Mengmeng Chen wrote:
Hi,
I'm reading the source code of MonetDB and I have a question about multi-threaded BAT binding and would appreciate it if someone could help to answer.
Here is my question:
When a sql_bind() MAL instruction is called, I see we have a callstack something like:
mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> _cs_find_name->_list_find_name
In _list_find_name, we have something like:
if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) { l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)&base_key);
for (n = l->h; n; n = n->next ) { sql_base *b = n->data; int key = base_key(b);
hash_add(l->ht, key, b); } } if (l && l->ht) { int key = hash_key(name); sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)];
for (; he; he = he->chain) { sql_base *b = he->value;
if (b->name && strcmp(b->name, name) == 0) return b; } return NULL; }
if (l) for (n = l->h; n; n = n->next) { sql_base *b = n->data;
/* check if names match */ if (name[0] == b->name[0] && strcmp(name, b->name) == 0) { return b; } }
Looks like the function is trying to create a hash table for the list if called the first time. However, there is no protection against data contention. Now suppose the mitosis optimizer is turned on and we could have something like:
X_157:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",0,8); X_159:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",1,8); X_161:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",2,8); X_163:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",3,8); X_165:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",4,8); X_167:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",5,8); X_169:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",6,8); X_171:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",7,8);
If my understanding is correct, since these MAL instructions has no dependency in each other, they could potentially be executed in different threads simultaneously. How would MonetDB prevent data contention in _list_find_name()? Particularly the case that if thread A has created a hash table and has not yet inserted the elements while thread B tries to search through the list and thinks the hash table is ready and starts to use it. It will fail to return the correct column since the hash table is empty, right?
If the hash is created during queriing, your conclusion is right. But that isn't the case, the hashtable gets created during table create, table loading (server startup) and plan creation. Al these aren't concurrent. Niels
Thanks, Mengmeng
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl