
Hi Hannes, sql>select "column","sorted" from storage() where "table"='grch36'; +------------------+--------+ | column | sorted | +==================+========+ | site | false | | base | false | | grch36_site_pkey | true | +------------------+--------+ Apparently it is not recognized. Here is what the first lines of the CSV file look like: 1|N 2|N 3|N 4|N 5|N 6|N 7|N 8|N 9|N 10|N 11|N 12|N 13|N 14|N ... Sebastian On 08.08.2013 12:49, Hannes Mühleisen wrote:
Hello Sebastian,
On 08/08/2013 12:46 PM, Sebastian Dorok wrote:
I have some slow queries that I want to accelerate by utilizing information about sorting.
My table definition: create table genomics.grch36 (site bigint, base char(1)); I use COPY INTO to populate the table. The source file size is 40GB and contains more than 3 billion rows. I can guarantee that the data in file is ordered by site.
I query the data like this:
sql>select base from genomics.grch36 where site between 10000 and 10010; +------+ | base | +======+ | N | | T | | A | | A | | C | | C | | C | | T | | A | | A | | C | +------+ 11 tuples (5m 23s)
I think 5 minutes seem too much for this query. Primary key or an index on 'site' don't work or at least aren't recognized in query execution. I think MonetDB would benefit when knowing that the data is ordered by site. Wouldn't it? Actually, the COPY INTO should recognize the "sortedness" and mark the
column accordingly.
What is the output of running
select "column","sorted" from storage() where "table"='genomics.grch36';
Best,
Hannes
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list