Hi,
On 03/09/2020 14:40, Amit Pandey wrote:
> Thanks Martin, yeah I get the fact that ultimately you have to "materialize", but whether you do so on tuple , batch or one operator at a time is the question.
>
> Interesting answer about the Cartesian product, but I guess the query optimizer will mostly avoid such a problem ?
Only a cost-based optimizer has the knowledge to abort the action if it expects an huge answer.
(but abandoning the cost-based optimizer track as well is a different story)
In MonetDB it will be detected when you attempt to run it and claim too much space.
>
> Also one unrelated question so the original X100 paper was a very interesting one and I think led to the birth of vectorwise. However, are those changes actually integrated in MonetDB ? I am kind of interested on that as it seems that MonetDB also could
> have back ported those changes
X100 was a completely different prototype, which evolved into a commercial product vectorwise.
Although most of the lessons learned during the 10 years work on MonetDB (1995-2005) were incorporated
in X100, it made some significant changes. Among them, going back to the Volcano-style execution engine :(
There were no reasons to abandonded the track set out to use materialized intermediates in MonetDB.
See the attached picture, which shows that MonetDB is competative when you take
the 8x more powerful hardware used in 2014 into account.
Hyper 14 sec, MonetDB 228/8=28 sec, Vectorwise= 93 sec
Without MonetDB relying on JIT compilation and SMD instructions.
A 'good' practice in science is to aim the spotlight on your success,
despite the fact that often the solutions presented are incomplete, proof-of-concepts,
and not hardened out by deployment in the real-world. MonetDB is much more than
a research prototype.
Indeed, understanding the ratios behind design choices are essential
and a more holistic evaluation of what people wish you to belief.
Furthermore, science will only progress if you dare to try truly different routes
then what you are being taught in school.
regards, Martin Kersten
>
> On Thursday, September 3, 2020, 04:48:41 PM GMT+5:30, Martin Kersten <
martin.kersten@cwi.nl> wrote:
>
>
> Hi,
>
> Every database system has to materialize intermediates.
> However, they differ where and for how long these intermediates live.
>
> It could be as short as using CPU registers to hold intermediates of
> arithmetic expresssion. It could be a cache line, or a memory page (4K),
> or a chunk (256KB), or a (memory-mapped) file
>
> Yes, MonetDB has chosen for the latter from day one.
>
> pro:
> - the available (RAM) resources are increasing and have fast access time.
> (1 billion integers comfortable fits a modern server system)
>
> - when you run a relational operator you have precise knowledge
> on the properties of its arguments (size, order, density).
> This greatly simplifies to find the JIT-algorithm to solve the operator.
>
> - the optimizer is simplified, it delegates the decision of
> what algortihms to apply to a runtime decision
>
> - under the hood, column views are used all over the place
> to avoid actual copying and copying is postponed until needed
>
> - a column, called a BAT in MonetDB, is nothing more than
> what other systems would call database-page, or chunk.
> With one big difference, it size is *not* determined by OS,
> file-system, or buffers. However, it is of variable size.
>
> -you can break a BAT into smaller chunks at no cost,
> push down the relational expressing and merge the result.
> It is how MonetDB can exploit the multi-cores, as it schedules
> tasks at the operator level.
>
> con:
> - if you run a simple 1 billion x 1 billion Cartesian product
> then the system bails out with an error message instead of
> producing all combinations over time
>
> The overall effect of materialization goes beyond merely
> resource size and copying, its benefits come from a clear
> separation of concerns.
> Just a snippet of the rational. The code base maintenance
> and performance of MonetDB demonstrates the validity of this approach.
>
> regards, Martin
>
> On 03/09/2020 12:29, Amit Pandey wrote:
> > Hey All,
> >
> > Thanks for the great open source project. So while going through some materials in a CMU DB course I saw it mentioned that MonetDB uses full materialization, and this is a bit weird since in c-store potentially scanning a billion columns a full
> > materialization may take a lot of time and the classic volcano model might be better.
> >
> > Can you guys please explain the rationale ?
> > These are the slides :-
https://15445.courses.cs.cmu.edu/fall2019/slides/12-queryexecution1.pdf> > MonetDB is mentioned in Slide 11.
> >
> > Thanks and Regards
>
> >
> > _______________________________________________
> > users-list mailing list
> >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> >
https://www.monetdb.org/mailman/listinfo/users-list