Hi,

I'm reading the source code of MonetDB and I have a question about multi-threaded BAT binding and would appreciate it if someone could help to answer.

Here is my question:

When a sql_bind() MAL instruction is called, I see we have a callstack something like:

mvc_bind_wrap -> mvc_bind-> mvc_bind_column -> find_sql_column -> _cs_find_name->_list_find_name

In _list_find_name, we have something like:

    if (l && !l->ht && list_length(l) > HASH_MIN_SIZE && l->sa) {
        l->ht = hash_new(l->sa, list_length(l), (fkeyvalue)&base_key);

        for (n = l->h; n; n = n->next ) {
            sql_base *b = n->data;
            int key = base_key(b);

            hash_add(l->ht, key, b);
        }
    }
    if (l && l->ht) {
        int key = hash_key(name);
        sql_hash_e *he = l->ht->buckets[key&(l->ht->size-1)];

        for (; he; he = he->chain) {
            sql_base *b = he->value;

            if (b->name && strcmp(b->name, name) == 0)
                return b;
        }
        return NULL;
    }

    if (l)
        for (n = l->h; n; n = n->next) {
            sql_base *b = n->data;

            /* check if names match */
            if (name[0] == b->name[0] && strcmp(name, b->name) == 0) {
                return b;
            }
        }

Looks like the function is trying to create a hash table for the list if called the first time. However, there is no protection against data contention. Now suppose the mitosis optimizer is turned on and we could have something like:

 X_157:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",0,8);
 X_159:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",1,8);
 X_161:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",2,8);
 X_163:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",3,8);
 X_165:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",4,8);
 X_167:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",5,8);
 X_169:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",6,8);
 X_171:bat[:oid,:oid]  := sql.tid(X_4,"sys","lineitem",7,8);

If my understanding is correct, since these MAL instructions has no dependency in each other, they could potentially be executed in different threads simultaneously. How would MonetDB prevent data contention in _list_find_name()?  Particularly the case that if thread A has created a hash table and has not yet inserted the elements while thread B tries to search through the list and thinks the hash table is ready and starts to use it. It will fail to return the correct column since the hash table is empty, right?



Thanks,
Mengmeng