Re: Improving performance of "select * from table" ?

17 Oct 2012

      Hi Gordon,

I see. Thanks to the explanations!

Well, I assume that for the export to some other system you still need a serialized textual form of the data, right?
Hence, the serialization cost themselves cannot be avoided.

In case the MAPI protocol and client-side data parsing and -rendering are indeed too much of an overhead even for "occasional" exports,
you might want to consider the bulk export functionality as described under
http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto
i.e.,
COPY subquery INTO file_name [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ]

This will generate a (possibly even compressed) "DSV" (delimiter separated values) file --- obviously in the file system of your server.
But if you wrap this functionality in a "export" button of your website (rather than let users type SQL literally),
you can easily place the file somewhere where the user can download it via http, ftp, or alike ...

Hope this helps ...
Stefan

----- Original Message -----
...
Hello Stefan,
Stefan Manegold wrote, On 10/17/2012 04:05 AM:
...
indeed, your experiments confirm that most of the time is spent in
serializing, sending, and re-parsing your queries huge result.
I am wondering what kind of application needs to see the entire
database table?
I'm working on a website that will provide some analysis on genomic
data.
Most of the time, users are indeed interested in
grouping/counting/summing and similar aggregation of the data -
which MonetDB does exceptionally well.
However, every now and then, users will want to export the data
(either the entire table, or a large chunk of the raw data) and
carry on the analysis on a different platform.
...
No user can seriously handle millions of result tuples with tens
(or more) of columns?
What is the purpose of using a DBMS if all it needs to do is select
* from table ?
I don't expect them to open the data in Excel, of course.
Users can export it to another web-based platforms ( e.g. Galaxy[1],
GenePatten[2],  GenomeSpace[3]), or save it to a file and run R
scripts on the data - those are designed to handle huge data files
without a problem.
[1] Galaxy - http://galaxy.psu.edu/
[2] GenePattern -
http://www.broadinstitute.org/cancer/software/genepattern/
[3] GenomeSpace - http://www.genomespace.org/
...
In my/our opinion, as much as possible of the application logic
should be expressed in SQL (plus domain-/application specific
extension, if required) and (thus) performed in the DBMS,
resulting in only small results that need to be sent back to the
client/application. Then the efficiency and performance of the
DBMS (server) itself is dominating and more important than the
server-client communication.
Generally I agree.
But the reality is that my web site will provide a specific (limited)
set of functions over the data, and to continue with down-stream
analysis the users will need to be able to take the data elsewhere -
I want to make it as easy as possible for them.
I'll investigate other ways to extract the data.
Thanks,
 -gordon
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list