Re: COPY INTO with ordered column

9 Aug 2013

      Hi,

sorry for stealing your time.
Indeed, the data files were corrupt and not sorted due to a bug in the 
program that generates them.

Now, everything works like it has to.
Thank you very much.

Regards,
Sebastian

On 08.08.2013 16:28, Hannes Mühleisen wrote:
...
Hello Sebastian,
I just tried this on my local installation of the latest release version
of MonetDB:
cat test.psv
1|N
2|N
3|N
4|N
5|N
...
then
create table genestuff (siteid int, base string)
copy into genestuff from '.../test.psv'
select "column","sorted" from storage() where "table"='genestuff';
+--------+--------+
| column | sorted |
+========+========+
| siteid | true   |
| base   | true   |
+--------+--------+
Having successfully loaded sorted datasets in the two-digit Gigabyte
range into MonetDB last week, I can confirm this is working fine. Please
scrutinize your data files a bit more.
Best,
Hannes
On 08/08/2013 04:17 PM, Sebastian Dorok wrote:
...
;)
I use one COPY INTO:
COPY INTO genomics.grch36 FROM '/path/to/file';
Regards,
Sebastian
On 08.08.2013 15:50, Hannes Mühleisen wrote:
...
Hello Sebastian,
On 08/08/2013 01:05 PM, Sebastian Dorok wrote:
...
sql>select "column","sorted" from storage() where "table"='grch36';
+------------------+--------+
| column           | sorted |
+==================+========+
| site             | false  |
| base             | false  |
| grch36_site_pkey | true   |
+------------------+--------+
Apparently it is not recognized.
Apparently your file is not sorted :)
Jokes aside, do you load the data in one COPY INTO run or several?
Best,
Hannes
...
Here is what the first lines of the CSV file look like:
1|N
2|N
3|N
4|N
5|N
6|N
7|N
8|N
9|N
10|N
11|N
12|N
13|N
14|N
...
On 08.08.2013 12:49, Hannes Mühleisen wrote:
...
On 08/08/2013 12:46 PM, Sebastian Dorok wrote:
...
I have some slow queries that I want to accelerate by utilizing
information about sorting.
My table definition: create table genomics.grch36 (site bigint, base
char(1));
I use COPY INTO to populate the table.
The source file size is 40GB and contains more than 3 billion rows.
I can guarantee that the data in file is ordered by site.
I query the data like this:
sql>select base from genomics.grch36 where site between 10000 and
10010;
+------+
| base |
+======+
| N    |
| T    |
| A    |
| A    |
| C    |
| C    |
| C    |
| T    |
| A    |
| A    |
| C    |
+------+
11 tuples (5m 23s)
I think 5 minutes seem too much for this query.
Primary key or an index on 'site' don't work or at least aren't
recognized in query execution.
I think MonetDB would benefit when knowing that the data is
ordered by
site. Wouldn't it?
Actually, the COPY INTO should recognize the "sortedness" and mark the
column accordingly.
What is the output of running
select "column","sorted" from storage() where
"table"='genomics.grch36';
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list