Hello,
I am interested to find out more about ALTER TABLE ALTER COLUMN SET STORAGE feature and how is that related to compression.
As far as I understood this is related to an active development branch, called MOSAIC which was never merged with any of the previous MonetDB versions. Obviously, compression is an important feature columnar databases are providing for data storage and manipulation. A module like MOSAIC that seems to allow several compression techniques, would be an interesting option.
First question I have: Can MOSAIC extension be used (sources added and custom compiled) with success for any of its proposed codecs with any of the newest versions (Apr2019 +). I mean without affecting any of embedded, capi, rapi and pyapi modules, which all exchange data with external libraries.
A quick read of MOSAIC code made me understand that this compression can be applied only on readonly PERSISTENT columns. That means that I would loose the major benefit of compression that I mostly need during importing stage. Sure I can imagine a controlled batching import scheme that would append data to tables and when it reaches certain threshold table is made readonly, then compressed, then added to a merged table, but this looks quite of a scenario. Am I wrong, can MOSAIC be used in a different scenario?
I can understand reasons behind compressing only PERSISTENT bats, yet I am wondering if TRANSIENT bats can also benefit from it especially for 1. result building stage (server-client or embedded version) or 2. for remote connections when data is transferred for merging operations.
Regarding to above question, are there any chances that you would consider keeping compressed results in memory? Sure I can use instead disk temporary tables for subsequent manipulation, but for performance reasons in memory compressed results would be way faster. Actually, when embedded version provides a result set, it stays valid till the user releases it, why not to be able to also use that for possible subsequent SQL operations that do not fit into a CTE scenario. That would provide superior flexibility and memory management to CTE mechanism. Temporary results can be developed in steps, they can be accessed directly at any time as convenient as temporary views in CTE, but without the burden of possible temporary bats that are not released till one CTE ends.
Thank you,
Dan