Re: [MonetDB-users] Order of WHERE clause has significant performance impact
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote:
Hi All - I have 2 queries which run very different based on how I order the WHERE clause.
When I put the larger filter first, it runs for 3.5 seconds.
Query 1:
select c5, c6, sum(sale) from sobi.sample where c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) and c6 in (5182) group by c5, c6;
When I reverse the order of the WHERE clause, it runs in about .4 seconds.
Query 2:
c5, c6, sum(sale) from sobi.sample where c6 in (5182) and c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) group by c5, c6;
This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I built from the super tarball.
I can attach the entire trace files for each query if appropriate for the list.
While I am still learning to read the trace, it looks like most of the time spent in the first query on a series of algebra.semijoin operations. Here are the excerpted lines:
Query 1:
[2320345 usec # _461 := algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, _460=<~tmp_2016>bat[:str,:oid]{142}) ] [776017 usec # _463 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] [253551 usec # _470 := algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, _469=<~tmp_2016>bat[:str,:oid]{1}) ] Query 2:
[412122 usec # _179 := algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, _178=<~tmp_2016>bat[:str,:oid]{1}) ] [ 194 usec # _181 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _180=<~tmp_1572>bat[:oid,:str]{304}) ] [ 0 usec # _470 := algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, _469=<~tmp_2016>bat[:str,:oid]{142}) ] Any assistance would be greatly appreciated.
Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table? If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra. Niels
-Ross
Niels - thanks for the response. I tried using the -G switch and the performance time was the same for both versions of the query. The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference. 3.3 vs .4 when varchar 1.9 vs .2 when int CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int ) CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int ) Would it introduce any issues or risks to have the optimizer reorder the execution of each WHERE clause based on the number of predicates specified? I am still learning about the optimizer, but would changing any steps in the default SQL optimization sequence help in this situation? Thanks again, Ross
On Thu, Aug 14, 2008 at 11:22:15AM -0500, Ross Bates wrote:
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote:
Hi All - I have 2 queries which run very different based on how I order the WHERE clause.
When I put the larger filter first, it runs for 3.5 seconds.
Query 1:
select c5, c6, sum(sale) from sobi.sample where c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) and c6 in (5182) group by c5, c6;
When I reverse the order of the WHERE clause, it runs in about .4 seconds.
Query 2:
c5, c6, sum(sale) from sobi.sample where c6 in (5182) and c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) group by c5, c6;
This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I built from the super tarball.
I can attach the entire trace files for each query if appropriate for the list.
While I am still learning to read the trace, it looks like most of the time spent in the first query on a series of algebra.semijoin operations. Here are the excerpted lines:
Query 1:
[2320345 usec # _461 := algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, _460=<~tmp_2016>bat[:str,:oid]{142}) ] [776017 usec # _463 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] [253551 usec # _470 := algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, _469=<~tmp_2016>bat[:str,:oid]{1}) ] Query 2:
[412122 usec # _179 := algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, _178=<~tmp_2016>bat[:str,:oid]{1}) ] [ 194 usec # _181 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _180=<~tmp_1572>bat[:oid,:str]{304}) ] [ 0 usec # _470 := algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, _469=<~tmp_2016>bat[:str,:oid]{142}) ] Any assistance would be greatly appreciated.
Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table?
If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra.
Niels
-Ross
Niels - thanks for the response.
I tried using the -G switch and the performance time was the same for both versions of the query. The -G helps or gives the same performance as the non -G version? Did you restart the server and used the latest sources (ie current)?
The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference.
3.3 vs .4 when varchar 1.9 vs .2 when int
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int )
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int )
Would it introduce any issues or risks to have the optimizer reorder the execution of each WHERE clause based on the number of predicates specified?
Should be possible. Niels
I am still learning about the optimizer, but would changing any steps in the default SQL optimization sequence help in this situation?
Thanks again, Ross
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Thu, Aug 14, 2008 at 07:20:40PM +0200, Niels Nes wrote:
On Thu, Aug 14, 2008 at 11:22:15AM -0500, Ross Bates wrote:
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote:
Hi All - I have 2 queries which run very different based on how I order the WHERE clause.
When I put the larger filter first, it runs for 3.5 seconds.
Query 1:
select c5, c6, sum(sale) from sobi.sample where c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) and c6 in (5182) group by c5, c6;
When I reverse the order of the WHERE clause, it runs in about .4 seconds.
Query 2:
c5, c6, sum(sale) from sobi.sample where c6 in (5182) and c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) group by c5, c6;
This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I built from the super tarball.
I can attach the entire trace files for each query if appropriate for the list.
While I am still learning to read the trace, it looks like most of the time spent in the first query on a series of algebra.semijoin operations. Here are the excerpted lines:
Query 1:
[2320345 usec # _461 := algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, _460=<~tmp_2016>bat[:str,:oid]{142}) ] [776017 usec # _463 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] [253551 usec # _470 := algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, _469=<~tmp_2016>bat[:str,:oid]{1}) ] Query 2:
[412122 usec # _179 := algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, _178=<~tmp_2016>bat[:str,:oid]{1}) ] [ 194 usec # _181 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _180=<~tmp_1572>bat[:oid,:str]{304}) ] [ 0 usec # _470 := algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, _469=<~tmp_2016>bat[:str,:oid]{142}) ] Any assistance would be greatly appreciated.
Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table?
If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra.
Niels
-Ross
Niels - thanks for the response.
I tried using the -G switch and the performance time was the same for both versions of the query. The -G helps or gives the same performance as the non -G version? Did you restart the server and used the latest sources (ie current)?
The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference.
3.3 vs .4 when varchar 1.9 vs .2 when int
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int )
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int )
Both queries on the 'algebra version' (-G) execute the exacte same plan. Question now is, is this the fast or the slow speed as compared to the old q1/q2 runs? Niels
On Thu, Aug 14, 2008 at 12:38 PM, Niels Nes
On Thu, Aug 14, 2008 at 07:20:40PM +0200, Niels Nes wrote:
On Thu, Aug 14, 2008 at 11:22:15AM -0500, Ross Bates wrote:
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote:
Hi All - I have 2 queries which run very different based on how I order the WHERE clause.
When I put the larger filter first, it runs for 3.5 seconds.
Query 1:
select c5, c6, sum(sale) from sobi.sample where c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) and c6 in (5182) group by c5, c6;
When I reverse the order of the WHERE clause, it runs in about .4 seconds.
Query 2:
c5, c6, sum(sale) from sobi.sample where c6 in (5182) and c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) group by c5, c6;
This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I built from the super tarball.
I can attach the entire trace files for each query if appropriate for the list.
While I am still learning to read the trace, it looks like most of the time spent in the first query on a series of algebra.semijoin operations. Here are the excerpted lines:
Query 1:
[2320345 usec # _461 := algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, _460=<~tmp_2016>bat[:str,:oid]{142}) ] [776017 usec # _463 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] [253551 usec # _470 := algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, _469=<~tmp_2016>bat[:str,:oid]{1}) ] Query 2:
[412122 usec # _179 := algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, _178=<~tmp_2016>bat[:str,:oid]{1}) ] [ 194 usec # _181 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _180=<~tmp_1572>bat[:oid,:str]{304}) ] [ 0 usec # _470 := algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, _469=<~tmp_2016>bat[:str,:oid]{142}) ] Any assistance would be greatly appreciated.
Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table?
If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra.
Niels
-Ross
Niels - thanks for the response.
I tried using the -G switch and the performance time was the same for both versions of the query. The -G helps or gives the same performance as the non -G version? Did you restart the server and used the latest sources (ie current)?
The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference.
3.3 vs .4 when varchar 1.9 vs .2 when int
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int )
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int )
Both queries on the 'algebra version' (-G) execute the exacte same plan. Question now is, is this the fast or the slow speed as compared to the old q1/q2 runs?
Niels
Neither query ran faster with the -G switch. They both ran at the same speed as without the switch. The version I'm running is v5.6.0, based on kernel v1.24.0 and was compiled using the June SuperBall. I have downloaded the August file and will compile and test again. To clarify, in your previous email you mentioned restarting the server. From what I can tell the '-G' switch is specific to mclient and therefore restarting the server shouldn't matter correct? Or is the mserver5 process aware of this switch as well? I'm coming to this conclusion that it's client only by comparing the output of these 2 commands: ./mclient --help ./mserver5 --help -Ross
On Thu, Aug 14, 2008 at 12:48:54PM -0500, Ross Bates wrote:
On Thu, Aug 14, 2008 at 12:38 PM, Niels Nes
wrote: On Thu, Aug 14, 2008 at 07:20:40PM +0200, Niels Nes wrote:
On Thu, Aug 14, 2008 at 11:22:15AM -0500, Ross Bates wrote:
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote:
Hi All - I have 2 queries which run very different based on how I order the WHERE clause.
When I put the larger filter first, it runs for 3.5 seconds.
Query 1:
select c5, c6, sum(sale) from sobi.sample where c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) and c6 in (5182) group by c5, c6;
When I reverse the order of the WHERE clause, it runs in about .4 seconds.
Query 2:
c5, c6, sum(sale) from sobi.sample where c6 in (5182) and c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) group by c5, c6;
This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I built from the super tarball.
I can attach the entire trace files for each query if appropriate for the list.
While I am still learning to read the trace, it looks like most of the time spent in the first query on a series of algebra.semijoin operations. Here are the excerpted lines:
Query 1:
[2320345 usec # _461 := algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, _460=<~tmp_2016>bat[:str,:oid]{142}) ] [776017 usec # _463 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] [253551 usec # _470 := algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, _469=<~tmp_2016>bat[:str,:oid]{1}) ] Query 2:
[412122 usec # _179 := algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, _178=<~tmp_2016>bat[:str,:oid]{1}) ] [ 194 usec # _181 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _180=<~tmp_1572>bat[:oid,:str]{304}) ] [ 0 usec # _470 := algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, _469=<~tmp_2016>bat[:str,:oid]{142}) ] Any assistance would be greatly appreciated.
Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table?
If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra.
Niels
-Ross
Niels - thanks for the response.
I tried using the -G switch and the performance time was the same for both versions of the query. The -G helps or gives the same performance as the non -G version? Did you restart the server and used the latest sources (ie current)?
The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference.
3.3 vs .4 when varchar 1.9 vs .2 when int
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int )
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int )
Both queries on the 'algebra version' (-G) execute the exacte same plan. Question now is, is this the fast or the slow speed as compared to the old q1/q2 runs?
Niels
Neither query ran faster with the -G switch. They both ran at the same speed as without the switch.
The version I'm running is v5.6.0, based on kernel v1.24.0 and was compiled using the June SuperBall. I have downloaded the August file and will compile and test again.
To clarify, in your previous email you mentioned restarting the server. From what I can tell the '-G' switch is specific to mclient and therefore restarting the server shouldn't matter correct? Or is the mserver5 process aware of this switch as well? I'm coming to this conclusion that it's client only by comparing the output of these 2 commands: Well the server caches 'query plans' so it will reuse the plan from before the -G switch (even new clients). So a server restart will help. But most of the fixes for this new algebra version went into the current (cvs head or nightly), ie you will not see them with 5.6.0 release.
Niels
./mclient --help ./mserver5 --help
-Ross
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Thu, Aug 14, 2008 at 1:00 PM, Niels Nes
On Thu, Aug 14, 2008 at 12:48:54PM -0500, Ross Bates wrote:
On Thu, Aug 14, 2008 at 12:38 PM, Niels Nes
wrote: On Thu, Aug 14, 2008 at 07:20:40PM +0200, Niels Nes wrote:
On Thu, Aug 14, 2008 at 11:22:15AM -0500, Ross Bates wrote:
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote:
Hi All - I have 2 queries which run very different based on how I order the WHERE clause.
When I put the larger filter first, it runs for 3.5 seconds.
Query 1:
select c5, c6, sum(sale) from sobi.sample where c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) and c6 in (5182) group by c5, c6;
When I reverse the order of the WHERE clause, it runs in about .4 seconds.
Query 2:
c5, c6, sum(sale) from sobi.sample where c6 in (5182) and c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) group by c5, c6;
This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I built from the super tarball.
I can attach the entire trace files for each query if appropriate for the list.
While I am still learning to read the trace, it looks like most of the time spent in the first query on a series of algebra.semijoin operations. Here are the excerpted lines:
Query 1:
[2320345 usec # _461 := algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, _460=<~tmp_2016>bat[:str,:oid]{142}) ] [776017 usec # _463 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] [253551 usec # _470 := algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, _469=<~tmp_2016>bat[:str,:oid]{1}) ] Query 2:
[412122 usec # _179 := algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, _178=<~tmp_2016>bat[:str,:oid]{1}) ] [ 194 usec # _181 := algebra.semijoin(_158=
bat[:oid,:str]{8280020}, _180=<~tmp_1572>bat[:oid,:str]{304}) ] [ 0 usec # _470 := algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, _469=<~tmp_2016>bat[:str,:oid]{142}) ] Any assistance would be greatly appreciated.
Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table?
If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra.
Niels
-Ross
Niels - thanks for the response.
I tried using the -G switch and the performance time was the same for both versions of the query. The -G helps or gives the same performance as the non -G version? Did you restart the server and used the latest sources (ie current)?
The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference.
3.3 vs .4 when varchar 1.9 vs .2 when int
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int )
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int )
Both queries on the 'algebra version' (-G) execute the exacte same plan. Question now is, is this the fast or the slow speed as compared to the old q1/q2 runs?
Niels
Neither query ran faster with the -G switch. They both ran at the same speed as without the switch.
The version I'm running is v5.6.0, based on kernel v1.24.0 and was compiled using the June SuperBall. I have downloaded the August file and will compile and test again.
To clarify, in your previous email you mentioned restarting the server. From what I can tell the '-G' switch is specific to mclient and therefore restarting the server shouldn't matter correct? Or is the mserver5 process aware of this switch as well? I'm coming to this conclusion that it's client only by comparing the output of these 2 commands: Well the server caches 'query plans' so it will reuse the plan from before the -G switch (even new clients). So a server restart will help. But most of the fixes for this new algebra version went into the current (cvs head or nightly), ie you will not see them with 5.6.0 release.
Niels
Ok, I understand now. I was using the official release from sourceforge, I have just learned how I can download and build from the nightly directory. I have built the August 14 super tar ball, loaded the table, made sure all queries ran ok, restarted the server, then ran them both again with the -G switch. Unfortunately the total usec in the trace file are the same as before for both queries. While I don't have the skills yet to start hacking the optimizer, if you all agree it's a viable and useful option I am more than happy to help with testing or anything else which would be of use. In the mean time I can always "optimize" at the application level.
On Thu, Aug 14, 2008 at 06:47:43PM -0500, Ross Bates wrote:
On Thu, Aug 14, 2008 at 1:00 PM, Niels Nes
wrote: On Thu, Aug 14, 2008 at 12:48:54PM -0500, Ross Bates wrote:
On Thu, Aug 14, 2008 at 12:38 PM, Niels Nes
wrote: On Thu, Aug 14, 2008 at 07:20:40PM +0200, Niels Nes wrote:
On Thu, Aug 14, 2008 at 11:22:15AM -0500, Ross Bates wrote:
On Tue, Aug 12, 2008 at 08:31:22PM -0500, Ross Bates wrote: > Hi All - I have 2 queries which run very different based on how I > order the WHERE clause. > > When I put the larger filter first, it runs for 3.5 seconds. > > Query 1: > > select c5, c6, sum(sale) from sobi.sample > where > c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) > and c6 in (5182) > group by c5, c6; > > > When I reverse the order of the WHERE clause, it runs in about .4 seconds. > > > Query 2: > > c5, c6, sum(sale) from sobi.sample > where > c6 in (5182) > and > c5 in (5253,5110,5375,5268,5257,5266,5243,5256,5252,5361,5379,5363,5386,5367,5240,5241,5261,5249,5245,5260,5247,5264,5354,5250,5373,5366,5246,5356,5365,5255,5262,5259,5251,5369,5357,5269,5364,5380,5244,5248,5242,5254,5358,5360,5263,5362,5374,5378,5370,5294,5339,5272,5277,5275,5295,5340,5273,5276,5270,5061,5081,5079,5080,5066,5056,5073,5060,5070,5099,5101,5100,5067,5063,5077,5072,5062,5075,5083,5086,5071,5095,5058,5087,5090,5102,5084,5098,5076,5059,5074,5094,5091,5065,5069,5068,5104,5109,5107,5108,5013,5019,5022,5010,5009,5023,5011,5024,5012,5021,5007,5008,5025,5020,5055,5006,5036,5038,5054,5032,5033,5031,5039,5035,5026,5027,5030,5200,5043,5005,5041,5049,5042,5052,5040,5053,5045,5047,5048,5050,5051,5044,5046) > group by c5, c6; > > This in on MonetDB server v5.6.0, based on kernel v1.24.0 which I > built from the super tarball. > > I can attach the entire trace files for each query if appropriate for the list. > > While I am still learning to read the trace, it looks like most of the > time spent in the first query on a series of algebra.semijoin > operations. Here are the excerpted lines: > > Query 1: > > [2320345 usec # _461 := > algebra.semijoin(_167=<~tmp_2116>bat[:str,:oid]{8280020}, > _460=<~tmp_2016>bat[:str,:oid]{142}) ] > [776017 usec # _463 := > algebra.semijoin(_158=
bat[:oid,:str]{8280020}, > _462=<~tmp_1572>bat[:oid,:str]{5135708}) ] > [253551 usec # _470 := > algebra.semijoin(_464=<~tmp_2116>bat[:str,:oid]{5135708}, > _469=<~tmp_2016>bat[:str,:oid]{1}) ] > > > Query 2: > > [412122 usec # _179 := > algebra.semijoin(_167=<~tmp_2076>bat[:str,:oid]{8280020}, > _178=<~tmp_2016>bat[:str,:oid]{1}) ] > [ 194 usec # _181 := > algebra.semijoin(_158= bat[:oid,:str]{8280020}, > _180=<~tmp_1572>bat[:oid,:str]{304}) ] > [ 0 usec # _470 := > algebra.semijoin(_182=<~tmp_2076>bat[:str,:oid]{304}, > _469=<~tmp_2016>bat[:str,:oid]{142}) ] > > > Any assistance would be greatly appreciated. Indeed the optimizer isn't smart enough to use the correct (most selective) in first. Could you send the create statement for the sobi.sample table?
If possible could you try this on the 'current' and with mclient -lsql -G. That last -G enables the new implementation which first translates sql into a relational algebra and then in binary algebra.
Niels > > -Ross
Niels - thanks for the response.
I tried using the -G switch and the performance time was the same for both versions of the query. The -G helps or gives the same performance as the non -G version? Did you restart the server and used the latest sources (ie current)?
The table is pretty simple (see below). I've tried the query against the table using both int and varchar for the columns being used in the WHERE clause. When c6 and C7 are declared as ints the queries run faster, but the order of the WHERE clause still creates a difference.
3.3 vs .4 when varchar 1.9 vs .2 when int
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 int, c7 int, c8 varchar(12), c9 varchar(12), sale int )
CREATE TABLE "sobi"."sample" ( c1 char(1), c2 varchar(24), c3 varchar(12), c4 varchar(12), c5 varchar(12), c6 varchar(12), c7 varchar(12), c8 varchar(12), c9 varchar(12), sale int )
Both queries on the 'algebra version' (-G) execute the exacte same plan. Question now is, is this the fast or the slow speed as compared to the old q1/q2 runs?
Niels
Neither query ran faster with the -G switch. They both ran at the same speed as without the switch.
The version I'm running is v5.6.0, based on kernel v1.24.0 and was compiled using the June SuperBall. I have downloaded the August file and will compile and test again.
To clarify, in your previous email you mentioned restarting the server. From what I can tell the '-G' switch is specific to mclient and therefore restarting the server shouldn't matter correct? Or is the mserver5 process aware of this switch as well? I'm coming to this conclusion that it's client only by comparing the output of these 2 commands: Well the server caches 'query plans' so it will reuse the plan from before the -G switch (even new clients). So a server restart will help. But most of the fixes for this new algebra version went into the current (cvs head or nightly), ie you will not see them with 5.6.0 release.
Niels
Ok, I understand now. I was using the official release from sourceforge, I have just learned how I can download and build from the nightly directory.
I have built the August 14 super tar ball, loaded the table, made sure all queries ran ok, restarted the server, then ran them both again with the -G switch. Unfortunately the total usec in the trace file are the same as before for both queries.
Using your create statements and queries I could see that the plans for both q1 and q2 are equal, ie they should be the same speed. If not something is wrong, ie wrong version or so. Question remaining is are the queries both on the same 'too' slow speed or are the fast. Niels
While I don't have the skills yet to start hacking the optimizer, if you all agree it's a viable and useful option I am more than happy to help with testing or anything else which would be of use. In the mean time I can always "optimize" at the application level.
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Niels Nes wrote:
I have built the August 14 super tar ball, loaded the table, made sure all queries ran ok, restarted the server, then ran them both again with the -G switch. Unfortunately the total usec in the trace file are the same as before for both queries.
Using your create statements and queries I could see that the plans for both q1 and q2 are equal, ie they should be the same speed. If not something is wrong, ie wrong version or so. Question remaining is are the queries both on the same 'too' slow speed or are the fast.
How about the compiler optimizer settings? e.g. --enable-optimize and friends
On 15-08-2008 09:08:23 +0200, Martin Kersten wrote:
Niels Nes wrote:
Using your create statements and queries I could see that the plans for both q1 and q2 are equal, ie they should be the same speed. If not something is wrong, ie wrong version or so. Question remaining is are the queries both on the same 'too' slow speed or are the fast.
How about the compiler optimizer settings? e.g. --enable-optimize and friends
The output of `mserver5 --version` will help a lot in that respect.
On Fri, Aug 15, 2008 at 2:37 AM, Fabian Groffen
On 15-08-2008 09:08:23 +0200, Martin Kersten wrote:
Niels Nes wrote:
Using your create statements and queries I could see that the plans for both q1 and q2 are equal, ie they should be the same speed. If not something is wrong, ie wrong version or so. Question remaining is are the queries both on the same 'too' slow speed or are the fast.
How about the compiler optimizer settings? e.g. --enable-optimize and friends
The output of `mserver5 --version` will help a lot in that respect.
I recompiled the server without the optimization flag, the output of mserver5 --version now just looks like this: Compilation: gcc -O2 -std=c99 -fgnu89-inline The performance difference between the 2 queries remains the same. Average of .4 seconds when the single predicate is first, average of 1.9 seconds when it's placed last. One other thing to note, the -G flag doesn't appear to have an impact one way or the other when using mclient., I get the same results. Also, when I use jdbc/squirrel I get the same results as well.
participants (4)
-
Fabian Groffen
-
Martin Kersten
-
Niels Nes
-
Ross Bates