Client PHP too slow with many results

Hello, I'm finding php client too slow returning big result sets. I'm using php client i i downloaded from http://dev.monetdb.org/downloads/sources/Latest/ MonetDB-11.13.3.tar.bz2 Some benchmark down, anyway of get the results faster? LIMIT 50K [root@monetdb examples]# php simple_query.php *9* [root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 50000' > res.txt password: real 0m4.566s <- already counting with some time to password input user 0m1.272s sys 0m0.132s ---- LIMIT 100K [root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 100000' > res.txt password: *real 0m7.184s* user 0m2.545s sys 0m0.266s [root@monetdb examples]# php simple_query.php *18* ---------- Code: [root@monetdb examples]# cat simple_query.php *Web: * www.adclickint.com ** http://www.adclickint.com/ * * http://www.adclickint.com/ http://www.adclickint.com/ _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list

Hello Eduardo The core problem might be your query itself. It is a row-store query ;) It is the extreme case encountered in a column store. There the benefits come from using (accessing) a limited number of columns or to perform aggregations. Consider queries such as "SELECT grp, sum(attr) FROM summary_sent WHERE predicate GROUP BY grp" or "SELECT attr1, attr2 FROM summary_sent WHERE predicate LIMIT 50000;" Your observation about php client performance most likely has little to do with the PHP interface to MonetDB. To check it, split the timing in the actual query execution part in the server and the retrieval loop. (Also be aware of cold/hot query processing) Although, I am not sure about the buffering scheme deployed in our PHP code, which may require some updates. In general, sending twice the amount of data over the wire will take twice the amount of time to process. regards, Martin On 11/15/12 5:53 PM, Eduardo Oliveira wrote:
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list

Hello Martin thank you for the quick answer,
I will try to explain you better my situation. I have a big table that will
grow around 20-30M rows a day (just inserts, i will create a new CSV and
load each 20-30 minutes), then i have a small table that right now is 100K
and probably will be 200K in 1-2 years, this one have a lot of updates. The
big table is very easy to shard by date, is for reports that all the time
user have to choose a > date and < date.
The final reports are made from a big table a join with small table, and
have around 100-500 rows (that is sometimes presented as table sometimes as
graph).
But because i will shard big table different servers is not easy to keep
the small table in all servers, and it have a lot of updates, and i read
that updates on monetdb are not really updates, just another structure
saying that was updated.
So the idea is to have a PHP HTTP API that will receive a query will query
the shards monetdb needed but can bring much more than 100-500 rows, can be
up to 10K rows, because is not joined with the small table yet and 1 group
by that is one column is just on the small table, is difficult to put that
column is the big table to avoid join because needs a lot of updates.
Then it will be done the join in the appplication, or using a table on
mysql a memory table writes will be very fast.
The times i gave below is all local, so some networking time will be added,
but so slow to get 10K of results is really a show stopper.
2012/11/15 Martin Kersten
--
*Eduardo Oliveira
* *IT*
***Email:* eduardo.oliveira@adclick.pt

Dear Eduardo, Based on our own experience in the past and reported also independently in the German CS Magazine a few years back, the PHP protocol is only slightly worse then the very optimized version in Mysql. Your application shows an append database, which does not cause any harm to the performance. For highly volatile updates the optimistic concurrency scheme in combination with the delta-storage is indeed not optimal. But given the small table size, a regular rebuild creating a fresh copy of the small one would deal with it. To proceed and attack your problems the cause must be identified first, e.g. move your timing in your script to separate cost of the PHP API result set receipt and analysis from the server query evaluation cost. Note that by the time the first record is sent over the wire, the complete result is already available within the server. You might also look at the "funnel" feature that would allow you to combine results of several servers in an easy way. Of course, solving a join in the application is the last resort. You could help us to make parts of your application known, e.g. schema and data, and identify where MonetDB could improve over its competitors in quantitative means, e.g. how fast is MySQL in tackling your problem. regards, Martin On 11/15/12 11:04 PM, Eduardo Oliveira wrote:
users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list

On 15-11-2012 23:28:01 +0100, Martin Kersten wrote:
This was with the version that back then relied on the C-implementation (mapi) of the protocol. Since this proved to be unmaintainable, we switched to a PHP native-implementation. This is obviously much less suited to gain high performance in large data volumes shipped over the wire, as you observed in your initial post. Fabian -- Fabian Groffen fabian@monetdb.org column-store pioneer http://www.monetdb.org/Home _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list

Fabian i understand that, thank you for the help, i have to workaround to
bring less data over the wire.
2012/11/16 Fabian Groffen
--
*Eduardo Oliveira
* *IT*
***Email:* eduardo.oliveira@adclick.pt
participants (4)
-
Eduardo Oliveira
-
Fabian Groffen
-
Martin Kersten
-
Martin Kersten