I've reimported my 350M rows on a different machine (rather than my workstation) and I'm re-running the simple query:
It's still taking hours on a RAID SSD, ~130GB RAM box and generating a multi GB thash file. CPU usage is very low.
The explain output for the first query is slightly different from when I was running it locally:
sql>explain select count(*) from t_order2 where orderid = 'XXXX';
+-----------------------------------------------------------------+
| mal |
+=================================================================+
| function user.s6_2{autoCommit=true}(A0:str):void; |
| X_3 := sql.mvc(); |
| X_7 := sql.bind(X_3,"sys","t_order2","orderid",0); |
| X_4:bat[:oid,:oid] := sql.tid(X_3,"sys","t_order2"); |
| X_37 := algebra.subselect(X_7,X_4,A0,A0,true,true,false); |
| (X_10,r1_10) := sql.bind(X_3,"sys","t_order2","orderid",2); |
| X_38 := algebra.subselect(r1_10,A0,A0,true,true,false); |
| X_13 := sql.bind(X_3,"sys","t_order2","orderid",1); |
| X_39 := algebra.subselect(X_13,X_4,A0,A0,true,true,false); |
| X_15 := sql.subdelta(X_37,X_4,X_10,X_38,X_39); |
| X_17 := sql.projectdelta(X_15,X_7,X_10,r1_10,X_13); |
| X_18 := aggr.count(X_17); |
| sql.exportValue(1,"sys.L1","L1","wrd",64,0,6,X_18,""); |
| end s6_2; |
| # optimizer.mitosis() |
| # optimizer.dataflow() |
+-----------------------------------------------------------------+
Is that thash file a lazily created index that'll speed up subsequent queries, or will I be creating it every time? If the former, is there any way to specify up front that you want to create and maintain it?
Is there some documentation I can read about the on-disk format? What's theap? For my orderId column its much larger than rows x VARCHAR(6) (7?). Is it some potentially sparse free store for mem mapping?
Is there any other information I can provide that might help find out why my first query is so slow? I'm really keen to use monetdb in an app I'm developing, but being able to satisfy simple lookups as well as the lightning fast aggregations it does would be very useful!