Field conversion sticks to subsequent queries
Hi all, I have a table with an INT column. The table is sorted by that column, and monetdb has recognized the sort order. While performing this query: -------------- SELECT * FROM table WHERE column = 12345678900; -------------- monetdb converts
On Wed, Jul 08, 2015 at 03:11:26PM +0200, Dennis Lorson wrote:
Hi all,
I have a table with an INT column. The table is sorted by that column, and monetdb has recognized the sort order.
While performing this query:
-------------- SELECT * FROM table WHERE column = 12345678900; --------------
monetdb converts
to BIGINT/LNG as evidenced by the trace: -------------- 3612684 | X_17=
[89617785] := batcalc.lng(X_16= [89617785]); -------------- This seems logical behavior since the condition value would overflow INT. (Of course the result set is empty by definition, but it’s a valid query)
However, the same happens when I then perform subsequent queries with values that *can* be represented by INT:
-------------- SELECT * FROM table WHERE column = 123; --------------
It is only when an explicit INT value is entered, that the type conversion is stopped for that query and subsequent ones:
-------------- SELECT * FROM table WHERE column = CAST(123 AS INT); --------------
In other words, once the type conversion is performed once, it seems to “stick” to subsequent queries.
Type conversion on that column takes ~4s on a 90MM row table here, while the regular query time is usually sub-millisecond. Entering the “wrong" value once will influence the DB performance heavily until restarted, which is far from a desirable scenario. Of course I could always clamp my value to INT range at the application level, but this is brittle, and the DB could do the right thing here.
Is there a design consideration that I’m missing, or should I file this as a bug/performance issue? Dennis
This has to do with the query cache, but I agree this could be considered a bug, ie could you file it on bugs.monetdb.org. Niels
Thanks,
Dennis _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Sorry for jumping in with a newbie question, but isn't the first case Dennis describes also a bug, or at least a mis-feature? That is, if the column is an INT, and the condition value does not equal any possible value of INT [1] - shouldn't MonetDB notice this fact and produce an empty BAT as the result without bothering convert the column? Eyal [1] - Unless it is force-cast as INT ignoring the overflow; but I would assume that is not what SQL semantics require. On 08/07/2015 16:38, Niels Nes wrote:
On Wed, Jul 08, 2015 at 03:11:26PM +0200, Dennis Lorson wrote:
Hi all,
I have a table with an INT column. The table is sorted by that column, and monetdb has recognized the sort order.
While performing this query:
-------------- SELECT * FROM table WHERE column = 12345678900; --------------
monetdb converts
to BIGINT/LNG as evidenced by the trace: -------------- 3612684 | X_17=
[89617785] := batcalc.lng(X_16= [89617785]); -------------- This seems logical behavior since the condition value would overflow INT. (Of course the result set is empty by definition, but it’s a valid query)
However, the same happens when I then perform subsequent queries with values that *can* be represented by INT:
-------------- SELECT * FROM table WHERE column = 123; --------------
It is only when an explicit INT value is entered, that the type conversion is stopped for that query and subsequent ones:
-------------- SELECT * FROM table WHERE column = CAST(123 AS INT); --------------
In other words, once the type conversion is performed once, it seems to “stick” to subsequent queries.
Type conversion on that column takes ~4s on a 90MM row table here, while the regular query time is usually sub-millisecond. Entering the “wrong" value once will influence the DB performance heavily until restarted, which is far from a desirable scenario. Of course I could always clamp my value to INT range at the application level, but this is brittle, and the DB could do the right thing here.
Is there a design consideration that I’m missing, or should I file this as a bug/performance issue? Dennis
This has to do with the query cache, but I agree this could be considered a bug, ie could you file it on bugs.monetdb.org.
Niels
Thanks,
Dennis _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Indeed, this could be added as a step to the optimizer pipeline I suppose. The gain would be pretty modest though, as a hash lookup or binary search for an out of range value should be rather efficient (if the needless column conversion itself is fixed). Casting a long greater than INT_MAX to int will cause the query to fail with an overflow error. - Dennis
On Jul 8, 2015, at 16:54, Eyal Rozenberg
wrote: Sorry for jumping in with a newbie question, but isn't the first case Dennis describes also a bug, or at least a mis-feature?
That is, if the column is an INT, and the condition value does not equal any possible value of INT [1] - shouldn't MonetDB notice this fact and produce an empty BAT as the result without bothering convert the column?
Eyal
[1] - Unless it is force-cast as INT ignoring the overflow; but I would assume that is not what SQL semantics require.
On 08/07/2015 16:38, Niels Nes wrote:
On Wed, Jul 08, 2015 at 03:11:26PM +0200, Dennis Lorson wrote:
Hi all,
I have a table with an INT column. The table is sorted by that column, and monetdb has recognized the sort order.
While performing this query:
-------------- SELECT * FROM table WHERE column = 12345678900; --------------
monetdb converts
to BIGINT/LNG as evidenced by the trace: -------------- 3612684 | X_17=
[89617785] := batcalc.lng(X_16= [89617785]); -------------- This seems logical behavior since the condition value would overflow INT. (Of course the result set is empty by definition, but it’s a valid query)
However, the same happens when I then perform subsequent queries with values that *can* be represented by INT:
-------------- SELECT * FROM table WHERE column = 123; --------------
It is only when an explicit INT value is entered, that the type conversion is stopped for that query and subsequent ones:
-------------- SELECT * FROM table WHERE column = CAST(123 AS INT); --------------
In other words, once the type conversion is performed once, it seems to “stick” to subsequent queries.
Type conversion on that column takes ~4s on a 90MM row table here, while the regular query time is usually sub-millisecond. Entering the “wrong" value once will influence the DB performance heavily until restarted, which is far from a desirable scenario. Of course I could always clamp my value to INT range at the application level, but this is brittle, and the DB could do the right thing here.
Is there a design consideration that I’m missing, or should I file this as a bug/performance issue? Dennis
This has to do with the query cache, but I agree this could be considered a bug, ie could you file it on bugs.monetdb.org.
Niels
Thanks,
Dennis _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
As Niels pointed out, this has to do with query template generattion and caching. All literal constants in a query become arguments to the template. sql>explain SELECT * FROM t WHERE i = 12345678900; +-------------------------------------------------------------------------------------------------------+ | mal | +=======================================================================================================+ | function user.s4_1{autoCommit=true}(A0:lng):void; | This upcasting of constants could/should in principle be avoided. Turning off the query cache is worth considering. The overhead of the optimizers is neglectable compare to the query run time in this case. regards, Martin On 08/07/15 19:54, Dennis Lorson wrote:
Indeed, this could be added as a step to the optimizer pipeline I suppose.
The gain would be pretty modest though, as a hash lookup or binary search for an out of range value should be rather efficient (if the needless column conversion itself is fixed).
Casting a long greater than INT_MAX to int will cause the query to fail with an overflow error.
- Dennis
On Jul 8, 2015, at 16:54, Eyal Rozenberg
wrote: Sorry for jumping in with a newbie question, but isn't the first case Dennis describes also a bug, or at least a mis-feature?
That is, if the column is an INT, and the condition value does not equal any possible value of INT [1] - shouldn't MonetDB notice this fact and produce an empty BAT as the result without bothering convert the column?
Eyal
[1] - Unless it is force-cast as INT ignoring the overflow; but I would assume that is not what SQL semantics require.
On 08/07/2015 16:38, Niels Nes wrote:
On Wed, Jul 08, 2015 at 03:11:26PM +0200, Dennis Lorson wrote:
Hi all,
I have a table with an INT column. The table is sorted by that column, and monetdb has recognized the sort order.
While performing this query:
-------------- SELECT * FROM table WHERE column = 12345678900; --------------
monetdb converts
to BIGINT/LNG as evidenced by the trace: -------------- 3612684 | X_17=
[89617785] := batcalc.lng(X_16= [89617785]); -------------- This seems logical behavior since the condition value would overflow INT. (Of course the result set is empty by definition, but it’s a valid query)
However, the same happens when I then perform subsequent queries with values that *can* be represented by INT:
-------------- SELECT * FROM table WHERE column = 123; --------------
It is only when an explicit INT value is entered, that the type conversion is stopped for that query and subsequent ones:
-------------- SELECT * FROM table WHERE column = CAST(123 AS INT); --------------
In other words, once the type conversion is performed once, it seems to “stick” to subsequent queries.
Type conversion on that column takes ~4s on a 90MM row table here, while the regular query time is usually sub-millisecond. Entering the “wrong" value once will influence the DB performance heavily until restarted, which is far from a desirable scenario. Of course I could always clamp my value to INT range at the application level, but this is brittle, and the DB could do the right thing here.
Is there a design consideration that I’m missing, or should I file this as a bug/performance issue? Dennis
This has to do with the query cache, but I agree this could be considered a bug, ie could you file it on bugs.monetdb.org.
Niels
Thanks,
Dennis _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Dennis Lorson
-
Eyal Rozenberg
-
Martin Kersten
-
Niels Nes