Understanding strides in memory accesses

Hi All, I am analysing the monetdb memory access pattern to identify various strides in LLC accesses. But most of the accesses per object are being observed as non-strided which is not what I expected. As monetdb involves BATs and these are arrays so I was thinking that there would be more strided pattern rather than random accesses unless BAT arrays are being treated as trees and random access is being used to access the items. To understand it further, I have now retrieved the execution plan of TPCH Q1 and the plan has various segments like follows: X_455:bat[:oid,:str] := sql.bind(X_7,"sys","lineitem","l_linestatus",2,5,16); X_1346 := algebra.kdifference(X_400,X_455); X_1379 := algebra.kunion(X_1346,X_455); X_1412 := algebra.leftjoin(X_1180,X_1379); (X_1452,X_1453) := group.multicolumns(X_1293,X_1412); X_1514 := bat.mirror(X_1452); X_1516 := algebra.leftjoin(X_1514,X_1412); I have two questions: 1) As there are many repeated segments like above in whole query plan, so what is the context behind multiple repeated segments please. 2) I want to understand what is the input to most common operators like leftjoin, kdifference, kunion and whether they process those inputs sequentially or randomly. What is the best way to get this understanding please? From the plan it is obvious that the input is the BAT(s), but how do the operators access the BAT is really a question. Thanks for the help. Kind Regards, Ahmad

Hi Ahmad, There is no one-liner answer to these questions, as the code produced depends on many decisions taken in the SQL-> MAL compiler/optimizer. Most of them reporeted by you have to do with the MVCC code management and the rest are individual kernel MAL instructions. A look at the underlying implementation will show that access patterns depend on many properties of the objects being accessed. All optimized for the situation at hand. Success with your study of the MonetDB code base. regards, Martin On 4/29/13 8:43 PM, Hassan, Ahmad wrote:
Hi All,
I am analysing the monetdb memory access pattern to identify various strides in LLC accesses. But most of the accesses per object are being observed as non-strided which is not what I expected. As monetdb involves BATs and these are arrays so I was thinking that there would be more strided pattern rather than random accesses unless BAT arrays are being treated as trees and random access is being used to access the items. To understand it further, I have now retrieved the execution plan of TPCH Q1 and the plan has various segments like follows:
X_455:bat[:oid,:str] := sql.bind(X_7,"sys","lineitem","l_linestatus",2,5,16);
X_1346 := algebra.kdifference(X_400,X_455);
X_1379 := algebra.kunion(X_1346,X_455);
X_1412 := algebra.leftjoin(X_1180,X_1379);
(X_1452,X_1453) := group.multicolumns(X_1293,X_1412);
X_1514 := bat.mirror(X_1452);
X_1516 := algebra.leftjoin(X_1514,X_1412);
I have two questions:
1)As there are many repeated segments like above in whole query plan, so what is the context behind multiple repeated segments please.
2)I want to understand what is the input to most common operators like leftjoin, kdifference, kunion and whether they process those inputs sequentially or randomly. What is the best way to get this understanding please? From the plan it is obvious that the input is the BAT(s), but how do the operators access the BAT is really a question.
Thanks for the help.
Kind Regards, Ahmad
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list

On Mon, Apr 29, 2013 at 06:43:09PM +0000, Hassan, Ahmad wrote:
Hi All,
I am analysing the monetdb memory access pattern to identify various strides in LLC accesses. But most of the accesses per object are being observed as non-strided which is not what I expected. As monetdb involves BATs and these are arrays so I was thinking that there would be more strided pattern rather than random accesses unless BAT arrays are being treated as trees and random access is being used to access the items. To understand it further, I have now retrieved the execution plan of TPCH Q1 and the plan has various segments like follows:
X_455:bat[:oid,:str] := sql.bind(X_7,"sys","lineitem","l_linestatus",2,5,16); X_1346 := algebra.kdifference(X_400,X_455); X_1379 := algebra.kunion(X_1346,X_455); X_1412 := algebra.leftjoin(X_1180,X_1379); (X_1452,X_1453) := group.multicolumns(X_1293,X_1412); X_1514 := bat.mirror(X_1452); X_1516 := algebra.leftjoin(X_1514,X_1412);
I have two questions:
1) As there are many repeated segments like above in whole query plan, so what is the context behind multiple repeated segments please.
This is due to "mitosis", MonetDB way of exploiting multicore parallelism by slicing one table (usually the largest) and replicating the query plan over all slices. A more detailed description goes beyond what would fit here.
2) I want to understand what is the input to most common operators like leftjoin, kdifference, kunion and whether they process those inputs sequentially or randomly. What is the best way to get this understanding please? From the plan it is obvious that the input is the BAT(s), but how do the operators access the BAT is really a question.
BATs in MonetDB represent columns of relational tables, hence they represent multi-sets, not trees. While MonetDB aims at using sequential access where ever efficiently feasible, we have no idea how to perform binary matching set operations, such as intersection, difference or join with only sequential access pattern (other than inherently inefficient nested loops). Instead, MonetDB mostly relies on hash-based algorithms, which perform sequential access on one BAT but inherently random access on the other. For more details, feel free to study the code. Best, Stefan
Thanks for the help.
Kind Regards, Ahmad
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

Hi Stefan, Thanks. This is very useful to help me in digging the code further. Kind Regards, Ahmad -----Original Message----- From: users-list [mailto:users-list-bounces+ahmad.hassan=sap.com@monetdb.org] On Behalf Of Stefan Manegold Sent: 29 April 2013 22:13 To: Communication channel for MonetDB users Subject: Re: Understanding strides in memory accesses On Mon, Apr 29, 2013 at 06:43:09PM +0000, Hassan, Ahmad wrote:
Hi All,
I am analysing the monetdb memory access pattern to identify various strides in LLC accesses. But most of the accesses per object are being observed as non-strided which is not what I expected. As monetdb involves BATs and these are arrays so I was thinking that there would be more strided pattern rather than random accesses unless BAT arrays are being treated as trees and random access is being used to access the items. To understand it further, I have now retrieved the execution plan of TPCH Q1 and the plan has various segments like follows:
X_455:bat[:oid,:str] := sql.bind(X_7,"sys","lineitem","l_linestatus",2,5,16); X_1346 := algebra.kdifference(X_400,X_455); X_1379 := algebra.kunion(X_1346,X_455); X_1412 := algebra.leftjoin(X_1180,X_1379); (X_1452,X_1453) := group.multicolumns(X_1293,X_1412); X_1514 := bat.mirror(X_1452); X_1516 := algebra.leftjoin(X_1514,X_1412);
I have two questions:
1) As there are many repeated segments like above in whole query plan, so what is the context behind multiple repeated segments please.
This is due to "mitosis", MonetDB way of exploiting multicore parallelism by slicing one table (usually the largest) and replicating the query plan over all slices. A more detailed description goes beyond what would fit here.
2) I want to understand what is the input to most common operators like leftjoin, kdifference, kunion and whether they process those inputs sequentially or randomly. What is the best way to get this understanding please? From the plan it is obvious that the input is the BAT(s), but how do the operators access the BAT is really a question.
BATs in MonetDB represent columns of relational tables, hence they represent multi-sets, not trees. While MonetDB aims at using sequential access where ever efficiently feasible, we have no idea how to perform binary matching set operations, such as intersection, difference or join with only sequential access pattern (other than inherently inefficient nested loops). Instead, MonetDB mostly relies on hash-based algorithms, which perform sequential access on one BAT but inherently random access on the other. For more details, feel free to study the code. Best, Stefan
Thanks for the help.
Kind Regards, Ahmad
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Hassan, Ahmad
-
Martin Kersten
-
Stefan Manegold