On 4 Sep 2020, at 10:48, Amit Pandey <amit2103@yahoo.com> wrote:

Thats interesting, I read it wrong then.

The comment about Hyper is more interesting I saw a paper on difficulty in benchmarking Hyper under real world conditions from Tableau are you referring to that or is it something more ?

that's the best reference, I think.

Between does MonetDB benchmark against other olap databases like Click House ?

We're working on it, to produce usable results.

- Jennie

On Thursday, September 3, 2020, 09:03:14 PM GMT+5:30, Martin Kersten <martin.kersten@cwi.nl> wrote:

MonetDB does *not* rely on any JIT compilation tricks. The compilation overhead often just dont pay off, as also experienced by Hyper when they moved into the real-world.

MonetDB does *not* use explicit SIMD instructions at all. You have to ensure software portability. It is left to the C- compilers.

The performance shows it is not essential

As said, a BAT is simply a large, variable sized vector, exploited as such by the kernel.
The ‘vector’ size is determined by an optimizer step and can differ between queries and resource competition.

Sent from my iPad

On 3 Sep 2020, at 17:03, Amit Pandey <amit2103@yahoo.com> wrote:

Thanks for the awesome answer. The last two paragraphs are pure gold.

I get the background a bit better now, between MonetDB does use JIT now it seems or did I read the code very wrong ? I guess you meant it wasnt there in 2014 ?

Also are you using SIMD , vectorization etc or totally dependent on materialization + code compilation ?

On Thursday, September 3, 2020, 07:58:51 PM GMT+5:30, Martin Kersten <martin.kersten@cwi.nl> wrote:

Hi,

On 03/09/2020 14:40, Amit Pandey wrote:
> Thanks Martin, yeah I get the fact that ultimately you have to "materialize", but whether you do so on tuple , batch or one operator at a time is the question.
>
> Interesting answer about the Cartesian product, but I guess the query optimizer will mostly avoid such a problem ?
Only a cost-based optimizer has the knowledge to abort the action if it expects an huge answer.
(but abandoning the cost-based optimizer track as well is a different story)

In MonetDB it will be detected when you attempt to run it and claim too much space.
>
> Also one unrelated question so the original X100 paper was a very interesting one and I think led to the birth of vectorwise. However, are those changes actually integrated in MonetDB ? I am kind of interested on that as it seems that MonetDB also could
> have back ported those changes

X100 was a completely different prototype, which evolved into a commercial product vectorwise.
Although most of the lessons learned during the 10 years work on MonetDB (1995-2005) were incorporated
in X100, it made some significant changes. Among them, going back to the Volcano-style execution engine :(
There were no reasons to abandonded the track set out to use materialized intermediates in MonetDB.

See the attached picture, which shows that MonetDB is competative when you take
the 8x more powerful hardware used in 2014 into account.
Hyper 14 sec, MonetDB 228/8=28 sec, Vectorwise= 93 sec

Without MonetDB relying on JIT compilation and SMD instructions.

A 'good' practice in science is to aim the spotlight on your success,
despite the fact that often the solutions presented are incomplete, proof-of-concepts,
and not hardened out by deployment in the real-world. MonetDB is much more than
a research prototype.

Indeed, understanding the ratios behind design choices are essential
and a more holistic evaluation of what people wish you to belief.
Furthermore, science will only progress if you dare to try truly different routes
then what you are being taught in school.

regards, Martin Kersten

>
> On Thursday, September 3, 2020, 04:48:41 PM GMT+5:30, Martin Kersten <martin.kersten@cwi.nl> wrote:
>
>
> Hi,
>
> Every database system has to materialize intermediates.
> However, they differ where and for how long these intermediates live.
>
> It could be as short as using CPU registers to hold intermediates of
> arithmetic expresssion. It could be a cache line, or a memory page (4K),
> or a chunk (256KB), or a (memory-mapped) file
>
> Yes, MonetDB has chosen for the latter from day one.
>
> pro:
> - the available (RAM) resources are increasing and have fast access time.
> (1 billion integers comfortable fits a modern server system)
>
> - when you run a relational operator you have precise knowledge
> on the properties of its arguments (size, order, density).
> This greatly simplifies to find the JIT-algorithm to solve the operator.
>
> - the optimizer is simplified, it delegates the decision of
> what algortihms to apply to a runtime decision
>
> - under the hood, column views are used all over the place
> to avoid actual copying and copying is postponed until needed
>
> - a column, called a BAT in MonetDB, is nothing more than
> what other systems would call database-page, or chunk.
> With one big difference, it size is *not* determined by OS,
> file-system, or buffers. However, it is of variable size.
>
> -you can break a BAT into smaller chunks at no cost,
> push down the relational expressing and merge the result.
> It is how MonetDB can exploit the multi-cores, as it schedules
> tasks at the operator level.
>
> con:
> - if you run a simple 1 billion x 1 billion Cartesian product
> then the system bails out with an error message instead of
> producing all combinations over time
>
> The overall effect of materialization goes beyond merely
> resource size and copying, its benefits come from a clear
> separation of concerns.
> Just a snippet of the rational. The code base maintenance
> and performance of MonetDB demonstrates the validity of this approach.
>
> regards, Martin
>
> On 03/09/2020 12:29, Amit Pandey wrote:
> > Hey All,
> >
> > Thanks for the great open source project. So while going through some materials in a CMU DB course I saw it mentioned that MonetDB uses full materialization, and this is a bit weird since in c-store potentially scanning a billion columns a full
> > materialization may take a lot of time and the classic volcano model might be better.
> >
> > Can you guys please explain the rationale ?
> > These are the slides :- https://15445.courses.cs.cmu.edu/fall2019/slides/12-queryexecution1.pdf
> > MonetDB is mentioned in Slide 11.
> >
> > Thanks and Regards
>
> >
> > _______________________________________________
> > users-list mailing list
> > users-list@monetdb.org <mailto:users-list@monetdb.org>
> > https://www.monetdb.org/mailman/listinfo/users-list

> >
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org <mailto:users-list@monetdb.org>

> https://www.monetdb.org/mailman/listinfo/users-list
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list