Improving performance of "select * from table" ?
Hello, I'm getting phenomenal performance with MonetDB with almost all of my query types, except one: The most basic "SELECT * FROM mytable WHERE (SOMETHING)" query takes much longer than other RDBMs (mysql, postgresql). Even "SELECT * FROM mytable LIMIT 10;" takes significantly more time than with other databases (and the more columns the table has, the more time it takes). I understand this is not the scenario that MonetDB was designed to address, but perhaps there's a way to improve it? (It's a requirement for my project to be able to export some/all of the data in a table.) Thanks, -gordon _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hai Gordon, How many columns and rows does you table have? How many rows does the first query return? what are the execution times of this query on mysql, postgresql and monetdb, respectively? Regards, Jennie On Oct 16, 2012, at 20:15 , Assaf Gordon wrote:
Hello,
I'm getting phenomenal performance with MonetDB with almost all of my query types, except one:
The most basic "SELECT * FROM mytable WHERE (SOMETHING)" query takes much longer than other RDBMs (mysql, postgresql).
Even "SELECT * FROM mytable LIMIT 10;" takes significantly more time than with other databases (and the more columns the table has, the more time it takes).
I understand this is not the scenario that MonetDB was designed to address, but perhaps there's a way to improve it? (It's a requirement for my project to be able to export some/all of the data in a table.)
Thanks, -gordon _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hello Jennie, Ying Zhang wrote, On 10/16/2012 04:46 PM:
How many columns and rows does you table have? How many rows does the first query return? what are the execution times of this query on mysql, postgresql and monetdb, respectively?
Below are some test results. Both servers (postgres and mserver) and clients (psql and mclient) are running on the same machine, with 8 cores and 16GB of RAM. The machine was not overloaded during the tests. The databases are stored on the same disk. I've run each test (except 5) couple of times, and listed most common result (so it's not about caching). Using PostgreSQL 9.2.1 (compiled from source) and MonetDB v11.14.0/unreleased (compiled from source). I can provide more technical details if needed. Assuming the "user" time is the time "wasted" by the client program (psql/mclient), and assuming "sys" time is negligible, then subtracting the "user" time from the "real" time gives an indication of how much time the server took to complete the query and transfer the results. It seems in most of those queries, the MonetDB server takes longer than PostgreSQL to complete. One striking difference is test #6 that needs to return just one line: the execution time for MonetDB is 9 seconds - I guess accessing 138 columns takes its toll... ## ## First table (bc251): 1,657,032 rows, 13 columns ## ## MonetDB Table size sql>select "schema", "table", sum("size") as size from storage() group by "schema","table" having "table" = 'bc251' ; +--------+-------+-----------+ | schema | table | size | +========+=======+===========+ | sys | bc251 | 132775936 | +--------+-------+-----------+ ## PostgreSQL Table Sql test=# SELECT pg_size_pretty(pg_relation_size('bc251')); pg_size_pretty ---------------- 212 MB (1 row) ## Test 1: select ONE column ## MonethDB $ time mclient -d voc -s "select seq from bc251;" > /dev/null real 0m15.981s user 0m5.653s sys 0m0.106s ## PostgreSQL $ time echo "select seq from bc251" | psql test > /dev/null real 0m3.565s user 0m2.698s sys 0m0.154s ## Test 2: select ALL columns ## MonetDB $ time mclient -d voc -s "select * from bc251;" > /dev/null real 2m16.388s user 0m38.468s sys 0m0.448s ## PostgreSQL $ time echo "select * from bc251" | psql test > /dev/null real 0m21.771s user 0m17.045s sys 0m0.843s ## Test 3: Select ONE row ## MonetDB $ time mclient -d voc -s "select * from bc251 limit 1" > /dev/null real 0m0.053s user 0m0.003s sys 0m0.004s ## PostgreSQL $ time echo "select * from bc251 limit 1" | psql test > /dev/null real 0m0.072s user 0m0.013s sys 0m0.009s ## ## Second table (NR102): 3,042,983 rows, 138 columns ## ## MonetDB Table Size sql>select "schema", "table", sum("size") as size from storage() group by "schema","table" having "table" = 'nr102' ; +--------+-------+------------+ | schema | table | size | +========+=======+============+ | sys | nr102 | 1408630784 | +--------+-------+------------+ 1 tuple (5.567ms) ## PostgreSQL Table Size test=# SELECT pg_size_pretty(pg_relation_size('NR102')); pg_size_pretty ---------------- 1850 MB (1 row) ## Test 4: Select ONE column ## MonetDB $ time mclient -d voc -s "select seq from NR102;" > /dev/null real 0m29.882s user 0m7.574s sys 0m0.107s ## PostgreSQL $ time echo "select seq from NR102" | psql test > /dev/null real 0m6.925s user 0m4.906s sys 0m0.268s ## Test 5: Select ALL columns ## MonetDB $ time mclient -d voc -s "select * from NR102;" > /dev/null real 39m58.618s user 10m9.066s sys 0m4.468s ## PostgreSQL $ time echo "select * from NR102" | psql test > /dev/null real 4m0.840s user 3m20.866s sys 0m12.592s ## Test 6: Select ONE row ## MonetDB $ time mclient -d voc -s "select * from NR102 limit 1" > /dev/null real 0m9.055s user 0m0.004s sys 0m0.008s ## PostgreSQL $ time echo "select * from NR102 limit 1" | psql test > /dev/null real 0m0.160s user 0m0.016s sys 0m0.006s _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
On Tue, 16 Oct 2012, Assaf Gordon wrote:
Assuming the "user" time is the time "wasted" by the client program (psql/mclient), and assuming "sys" time is negligible, then subtracting the "user" time from the "real" time gives an indication of how much time the server took to complete the query and transfer the results.
What about: the time taken to serialise the results? For MonetDB this is done in a very 'plain text' wire-format. Are you aware of the PostgreSQL \timing command? In that way you can see outputs similar to that of MonetDB with respect of time taken for a query. Stefan _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Stefan de Konink wrote, On 10/16/2012 06:49 PM:
On Tue, 16 Oct 2012, Assaf Gordon wrote:
Assuming the "user" time is the time "wasted" by the client program (psql/mclient), and assuming "sys" time is negligible, then subtracting the "user" time from the "real" time gives an indication of how much time the server took to complete the query and transfer the results.
What about: the time taken to serialise the results? For MonetDB this is done in a very 'plain text' wire-format.
Indeed, that could explain some of the differences, but: 1. I need to take the entire process into account, because whether I use "mclient" or Perl (for my website), the results will have to be transferred somehow. If the serialization is the bottle-neck, perhaps there are other ways to work around it? 2. Serialization alone doesn't explain test #6, where a query for a single row with 138 columns takes 9 seconds to complete.
Are you aware of the PostgreSQL \timing command? In that way you can see outputs similar to that of MonetDB with respect of time taken for a query.
Thanks for the tip, I'll try to rerun the tests with that. -gordon _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hello Stefan,
Stefan de Konink wrote, On 10/16/2012 06:49 PM:
[...] What about: the time taken to serialise the results? For MonetDB this is done in a very 'plain text' wire-format. [...] Are you aware of the PostgreSQL \timing command? In that way you can see outputs similar to that of MonetDB with respect of time taken for a query.
I re-run one test using psql's and mclient's internal timing. === ## Postgres: Select all columns from table $ psql test test=# \o /dev/null test=# \timing Timing is on. test=# select * from bc251; Time: 5506.768 ms test=# select * from bc251; Time: 5446.689 ms test=# ## MonetDB: Select all columns from table $ mclient -d voc sql>\>/dev/null sql>\w-1 sql>select * from bc251; 1657032 tuples (3.8s) sql>select * from bc251; 1657032 tuples (3.8s) sql> === As you said, the reported durations are much shorter than in my previous timings (monetDb: 3.8s vs 2m16s, postgres 5.5s vs 21s). However, those numbers don't reveal the entire story: mclient reports 3.8 seconds, but it took more than 2 minutes (of wall time) for the result message to be printed. I understand that this quite possibly indicates a serialization/transfer bottle-neck, and it could be of lesser interest to the developers, but as an application developer I worry about the end-to-end timing of the query - the time it takes for my application to get the data back. If there are any recommendation on improving those queries, it would be much appreciated. Thanks, -gordon _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Assaf Gordon
-
Stefan de Konink
-
Ying Zhang