New subject: ALTER TABLE ALTER COLUMN SET STORAGE

13 Sep 2019

      Hi Dan,

On 12-09-19 11:01, developers-list-request@monetdb.org wrote:
...
Send developers-list mailing list submissions to
  developers-list@monetdb.org
To subscribe or unsubscribe via the World Wide Web, visit
  https://www.monetdb.org/mailman/listinfo/developers-list
or, via email, send a message with subject or body 'help' to
  developers-list-request@monetdb.org
You can reach the person managing the list at
  developers-list-owner@monetdb.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of developers-list digest..."
Today's Topics:
1. Re: ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
    2. Re: ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
----------------------------------------------------------------------
Message: 1
Date: Wed, 11 Sep 2019 13:53:27 +0300
From: Daniel Zvinca 
To: "Communication channel for developers of the MonetDB suite."

Subject: Re: ALTER TABLE ALTER COLUMN SET STORAGE
Message-ID:

Content-Type: text/plain; charset="utf-8"
Thank you so much for your answer, Aris.
The good news is that compression is considered and is going to be part of
MonetDB in two releases or so (one year).
First release is probably much closer to half a year.
I will try to see if I can use the current code in a custom build, I am
quite curious how that will affect performance.
To be honest I don't expect performance issues, nowadays even on SSD era,
it is still faster to read compressed data and decompress in memory if the
right ratio is there, of course. And I do expect decent compression ratio
on most of the data.
The idea is that between every memory fetch you have quite some CPU 
cycles at your disposal which can be used to perform either 
(potentially) more expensive algebraic operations directly on compressed 
data or to just decompress data. This is the general principle that 
implies that at least lightweight compression can be used to accelerate 
queries.
...
However, at this stage the MOSAIC's dual compressed - uncompressed
storage will obviously not give me the gain I need. Yet, it is interesting
to understand at least how query performance might look in the future.
Again this feature is still under discussion. But I don't think it is 
unlikely that it will enter into Mosaic.
...
Bad news is of course the compression feature is going to happen in ... one
year. But if it comes also with support for compressing in memory results
(I know, it wasnt promised), it might worth to wait.
We're working it ☺. More remarks below on your other email...
...
Best regards,
On Wed, Sep 11, 2019 at 12:18 PM aris 
wrote:
...
-------- Forwarded Message --------
Subject: Re: ALTER TABLE ALTER COLUMN SET STORAGE
Date: Tue, 10 Sep 2019 15:15:09 +0200
From: aris 

To: developers-list-request@monetdb.org
Hi Daniel,
On 10-09-19 12:00, developers-list-request@monetdb.org wrote:
Send developers-list mailing list submissions to
developers-list@monetdb.org
To subscribe or unsubscribe via the World Wide Web, visit
https://www.monetdb.org/mailman/listinfo/developers-list
or, via email, send a message with subject or body 'help' to
developers-list-request@monetdb.org
You can reach the person managing the list at
developers-list-owner@monetdb.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of developers-list digest..."
Today's Topics:
1. ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
----------------------------------------------------------------------
Message: 1
Date: Tue, 10 Sep 2019 12:40:22 +0300
From: Daniel Zvinca  
To: developers-list@monetdb.org
Subject: ALTER TABLE ALTER COLUMN SET STORAGE
Message-ID:

Content-Type: text/plain; charset="utf-8"
Hello,
I am interested to find out more about ALTER TABLE ALTER COLUMN SET STORAGE
feature and how is that related to compression.
As far as I understood this is related to an active development branch,
called MOSAIC which was never merged with any of the previous MonetDB
versions. Obviously, compression is an important feature columnar databases
are providing for data storage and manipulation. A module like MOSAIC that
seems to allow several compression techniques, would be an interesting
option.
Yes, compression a.k.a. Mosaic is a going to be a new feature in MonetDB.
Although the feature won't be included in the upcoming November release.
Most likely, you can expect the feature in the first release after the
November release. But Mosaic is a somewhat big undertaking. our current
road map is probably covering multiple future MonetDB releases before all
envisioned compression features are available in MonetDB. The first
milestone in the current road map is to apply a single compression
technique on an entire column. But one of the next milestone is to
partition a column into variable-sized compression blocks. Within each
block a particular compression is applied.
First question I have: Can MOSAIC extension be used (sources added and
custom compiled) with success for any of its proposed codecs with any of
the newest versions (Apr2019 +). I mean without affecting any of embedded,
capi, rapi and pyapi modules, which all exchange data with external
libraries.
If by this you mean you want to import the mosaic module as an external
library into an existing release out of the box, then the answer is no.
There are some slight modifications in the GDK layer to accommodate the
Mosaic module. And to interact with it from SQL, there are also some code
changes in the SQL layer. But besides those dependencies, I don't expect
any issue with the particular (x)api frameworks. But nothing is guaranteed
obviously. It sounds like you want to hack-back port it into custom builds
of earlier releases. I wouldn't give it a zero change of success but I do
wish you much luck :)
A quick read of MOSAIC code made me understand that this compression can be
applied only on readonly PERSISTENT columns. That means that I would loose
the major benefit of compression that I mostly need during importing stage.
Sure I can imagine a controlled batching import scheme that would append
data to tables and when it reaches certain threshold table is made
readonly, then compressed, then added to a merged table, but this looks
quite of a scenario. Am I wrong, can MOSAIC be used in a different
scenario?
Your observation about the joint life cycle of a Mosaic structure and its
original column file is correct: currently Mosaic adds a compressed
representation next to the existing uncompressed column. For the first
milestone on the Mosaic road map we want to successfully apply compression
on READ-ONLY pre-existing columns where the purpose of compression is to
potentially accelerate analytical queries on these columns. However we are
still looking into potentially freeing the uncompressed column once a
compressed Mosaic heap is available. This would accommodate compression for
the more traditional sake of limiting memory- and/or disk footprint.
I can understand reasons behind compressing only PERSISTENT bats, yet I am
wondering if TRANSIENT bats can also benefit from it especially for 1.
result building stage (server-client or embedded version) or 2. for remote
connections when data is transferred for merging operations.
Regarding to above question, are there any chances that you would consider
keeping compressed results in memory? Sure I can use instead disk
temporary tables for subsequent manipulation, but for performance reasons
in memory compressed results would be way faster. Actually, when embedded
version provides a result set, it stays valid till the user releases it,
why not to be able to also use that for possible subsequent SQL operations
that do not fit into a CTE scenario. That would provide superior
flexibility and memory management to CTE mechanism. Temporary results can
be developed in steps, they can be accessed directly at any time as
convenient as temporary views in CTE, but without the burden of possible
temporary bats that are not released till one CTE ends.
I think it is an interesting idea. But I think it is part of a more
general goal/problem of how to handle updates on compressed data. There are
internal  discussions on this topic. But whatever the outcome, this will be
only relevant for a much later milestone on the road map.
Thank you,
Dan
Hope it helps.
Kind regards,
Aris

Re: ALTER TABLE ALTER COLUMN SET STORAGE

aris

Daniel Zvinca

tags

participants (2)