Hi Gordon, I see. Thanks to the explanations! Well, I assume that for the export to some other system you still need a serialized textual form of the data, right? Hence, the serialization cost themselves cannot be avoided. In case the MAPI protocol and client-side data parsing and -rendering are indeed too much of an overhead even for "occasional" exports, you might want to consider the bulk export functionality as described under http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto i.e., COPY subquery INTO file_name [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ] This will generate a (possibly even compressed) "DSV" (delimiter separated values) file --- obviously in the file system of your server. But if you wrap this functionality in a "export" button of your website (rather than let users type SQL literally), you can easily place the file somewhere where the user can download it via http, ftp, or alike ... Hope this helps ... Stefan ----- Original Message -----
Hello Stefan,
Stefan Manegold wrote, On 10/17/2012 04:05 AM:
indeed, your experiments confirm that most of the time is spent in serializing, sending, and re-parsing your queries huge result.
I am wondering what kind of application needs to see the entire database table?
I'm working on a website that will provide some analysis on genomic data. Most of the time, users are indeed interested in grouping/counting/summing and similar aggregation of the data - which MonetDB does exceptionally well.
However, every now and then, users will want to export the data (either the entire table, or a large chunk of the raw data) and carry on the analysis on a different platform.
No user can seriously handle millions of result tuples with tens (or more) of columns? What is the purpose of using a DBMS if all it needs to do is select * from table ?
I don't expect them to open the data in Excel, of course. Users can export it to another web-based platforms ( e.g. Galaxy[1], GenePatten[2], GenomeSpace[3]), or save it to a file and run R scripts on the data - those are designed to handle huge data files without a problem.
[1] Galaxy - http://galaxy.psu.edu/ [2] GenePattern - http://www.broadinstitute.org/cancer/software/genepattern/ [3] GenomeSpace - http://www.genomespace.org/
In my/our opinion, as much as possible of the application logic should be expressed in SQL (plus domain-/application specific extension, if required) and (thus) performed in the DBMS, resulting in only small results that need to be sent back to the client/application. Then the efficiency and performance of the DBMS (server) itself is dominating and more important than the server-client communication.
Generally I agree. But the reality is that my web site will provide a specific (limited) set of functions over the data, and to continue with down-stream analysis the users will need to be able to take the data elsewhere - I want to make it as easy as possible for them.
I'll investigate other ways to extract the data.
Thanks, -gordon _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list