Re: COPY INTO with ordered column

8 Aug 2013

      Hi Hannes,

sql>select "column","sorted" from storage() where "table"='grch36';
+------------------+--------+
| column           | sorted |
+==================+========+
| site             | false  |
| base             | false  |
| grch36_site_pkey | true   |
+------------------+--------+

Apparently it is not recognized.

Here is what the first lines of the CSV file look like:

1|N
2|N
3|N
4|N
5|N
6|N
7|N
8|N
9|N
10|N
11|N
12|N
13|N
14|N
...

Sebastian

On 08.08.2013 12:49, Hannes Mühleisen wrote:
...
Hello Sebastian,
On 08/08/2013 12:46 PM, Sebastian Dorok wrote:
...
I have some slow queries that I want to accelerate by utilizing
information about sorting.
My table definition: create table genomics.grch36 (site bigint, base
char(1));
I use COPY INTO to populate the table.
The source file size is 40GB and contains more than 3 billion rows.
I can guarantee that the data in file is ordered by site.
I query the data like this:
sql>select base from genomics.grch36 where site between 10000 and 10010;
+------+
| base |
+======+
| N    |
| T    |
| A    |
| A    |
| C    |
| C    |
| C    |
| T    |
| A    |
| A    |
| C    |
+------+
11 tuples (5m 23s)
I think 5 minutes seem too much for this query.
Primary key or an index on 'site' don't work or at least aren't
recognized in query execution.
I think MonetDB would benefit when knowing that the data is ordered by
site. Wouldn't it?
Actually, the COPY INTO should recognize the "sortedness" and mark the
column accordingly.
What is the output of running
select "column","sorted" from storage() where "table"='genomics.grch36';
Best,
Hannes
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list