Re: Why MonetDB uses Materialization

3 Sep 2020

      Thanks Martin, yeah I get the fact that ultimately you have to "materialize", but whether you do so on tuple , batch or one operator at a time is the question.
Interesting answer about the Cartesian product, but I guess the query optimizer will mostly avoid such a problem ?
Also one unrelated question so the original X100 paper was a very interesting one and I think led to the birth of vectorwise.  However, are those changes actually integrated in MonetDB ? I am kind of interested on that as it seems that MonetDB also could have back ported those changes
    On Thursday, September 3, 2020, 04:48:41 PM GMT+5:30, Martin Kersten  wrote:  

 Hi,

Every database system has to materialize intermediates.
However, they differ where and for how long these intermediates live.

It could be as short as using CPU registers to hold intermediates of
arithmetic expresssion. It could be a cache line, or a memory page (4K),
or a chunk (256KB), or a (memory-mapped) file

Yes, MonetDB has chosen for the latter from day one.

pro:
- the available (RAM) resources are increasing and have fast access time.
  (1 billion integers comfortable fits a modern server system)

- when you run a relational operator you have precise knowledge
on the properties of its arguments (size, order, density).
This greatly simplifies to find the JIT-algorithm to solve the operator.

- the optimizer is simplified, it delegates the decision of
what algortihms to apply to a runtime decision

- under the hood, column views are used all over the place
to avoid actual copying and copying is postponed until needed

- a column, called a BAT in MonetDB, is nothing more than
what other systems would call database-page, or chunk.
With one big difference, it size is *not* determined by OS,
file-system, or buffers. However, it is of variable size.

-you can break a BAT into smaller chunks at no cost,
push down the relational expressing and merge the result.
It is how MonetDB can exploit the multi-cores, as it schedules
tasks at the operator level.

con:
- if you run a simple 1 billion x 1 billion Cartesian product
then the system bails out with an error message instead of
producing all combinations over time

The overall effect of materialization goes beyond merely
resource size and copying, its benefits come from a clear
separation of concerns.
Just a snippet of the rational. The code base maintenance
and performance of MonetDB demonstrates the validity of this approach.

regards, Martin

On 03/09/2020 12:29, Amit Pandey wrote:
...
Hey All,
Thanks for the great open source project.  So while going through some materials  in a CMU DB course I saw it mentioned that MonetDB uses full materialization, and this is a bit weird since in c-store potentially scanning a billion columns a full 
materialization may take a lot of time and the classic volcano model might be better.
Can you guys please explain the rationale ?
These are the slides :- https://15445.courses.cs.cmu.edu/fall2019/slides/12-queryexecution1.pdf
MonetDB is mentioned in Slide 11.
Thanks and Regards
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list