Niels Nes wrote:
On Wed, Apr 30, 2008 at 03:58:11PM +0200, Nils Grimsmo wrote:
I have tables with around 12 million rows and large primary keys. I find the performance for various queries to be rather strange.
* Why is Q2 (see below) so much slower than Q1? It is actually faster for me to use Q1 and do the rest of the matching myself.
* Why is Q3 so immensely much faster than Q1? I understand that the first part of the primary key is selective in Q3. Do I understand right if Q1 scans until "type" matches, then scans until "name" matches, then scans until "value" matches? No indexes are put on disk, right? Then the matching for Q3 should be scan until "value" matches, which shouldn't be more than at most three times faster, right (if three columns are read instead of one for Q1)? I do Q1 twice to look for cash effects, but none are seen... Could some columns from the table "value" be cached and kept, without giving room for those from "type_name_value"?
* Q4 has the same performance as Q1, as expected.
* Why isn't Q5 faster than Q6? 82% of the rows have type=3. Fewer should be scanned to find that "!!foo" doesn't exist than for "~~foo", as "!" has a low ASCII value, while "~" is high. (See COUNT queries at the bottom.)
The performance differences are related to the order in which the where conditions are applied. Please send us also the table create statement, then we can use 'explain' to see which order is taken. You could also run your queries prefixed with 'trace' to see where the time goes.
Been playing around a bit, and it seems to me like SELECTs are very slow unless the first part of the PRIMARY KEY is selective. ALTER TABLE base \ ADD PRIMARY KEY(doc, nenc, pos, complete) CREATE TABLE base2 \ AS SELECT nenc,doc,pos,type,name,pathID,complete,value \ FROM base \ ORDER BY nenc,doc,pos,complete WITH DATA ALTER TABLE base2 \ ADD PRIMARY KEY (nenc,doc,pos,complete) SELECT b.value \ FROM base AS b \ WHERE doc = 1 AND b.nenc = '1.1.1.1'; Timer 27344.739 msec SELECT b.value \ FROM _tmp2 AS b \ WHERE b.nenc = '1.1.1.1' AND doc = 1; Timer 0.773 msec The first query takes more than 35'000 times as long. All rows have doc=1. If no indexes are kept on disk, I do not understand why the first query should take more than twice as long as the second. Here is a query on base2 using LIKE instead. SELECT b.value \ FROM _tmp2 AS b \ WHERE b.nenc LIKE '1.1.1.1%' AND doc = 1; Timer 1538.009 msec As the condition is a proper prefix, I also don't understand why this is 2000 times slower, if no indexes are involved. (The number of results is the same.) According to Martin.Kersten@cwi.nl (in Message-id <4806471A.4090509@cwi.nl>), "Hash indices are used when needed and maintained under update. However, they are never saved on disk." Does this mean a hash index is never used in a SELECT? Or could it be just kept in memory if the table is not evicted? I could not find an answer in the Inner Core documentation. Klem fra Nils