Hi,
I'm reading the source code of MonetDB and I
have a question about multi-threaded BAT binding and would appreciate it
if someone could help to answer.
Here is my question:
When a sql_bind() MAL instruction is called, I see we have a callstack something like:
mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> _cs_find_name->_list_find_name
In _list_find_name, we have something like:
if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) {
l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)&base_key);
for (n = l->h; n; n = n->next ) {
sql_base *b = n->data;
int key = base_key(b);
hash_add(l->ht, key, b);
}
}
if (l && l->ht) {
int key = hash_key(name);
sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)];
for (; he; he = he->chain) {
sql_base *b = he->value;
if (b->name && strcmp(b->name, name) == 0)
return b;
}
return NULL;
}
if (l)
for (n = l->h; n; n = n->next) {
sql_base *b = n->data;
/* check if names match */
if (name[0] == b->name[0] && strcmp(name, b->name) == 0) {
return b;
}
}
Looks
like the function is trying to create a hash table for the list if
called the first time. However, there is no protection against data
contention. Now suppose the mitosis optimizer is turned on and we could
have something like:
X_157:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",0,8);
X_159:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",1,8);
X_161:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",2,8);
X_163:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",3,8);
X_165:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",4,8);
X_167:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",5,8);
X_169:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",6,8);
X_171:bat[:oid,:oid] := sql.tid(X_4,"sys","lineitem",7,8);
If
my understanding is correct, since these MAL instructions has no
dependency in each other, they could potentially be executed in
different threads simultaneously. How would MonetDB prevent data
contention in _list_find_name()? Particularly the case that if thread A
has created a hash table and has not yet inserted the elements while
thread B tries to search through the list and thinks the hash table is
ready and starts to use it. It will fail to return the correct column
since the hash table is empty, right?
Thanks,
Mengmeng