MonetDB with dynamic columns

24 Nov 2015

      Hi,

I've had my eyes on MonetDB for quite some time and the Numpy implementation really pushed me over the edge (as in this is the right solution for us). A big shout out to Mark Raasveldt for both the function and support and bug-fixes along the way. We have a fairly small dataset which probably won't exceed 10 million rows, 500 columns. We do flight data analysis so each row can be though of as a flight, and all the columns are data-points connected to it. We then use Numpy to prepare the data for graphs and limit our selection using standard SQL. We generally SELECT between 1-5 columns and use the WHERE statement on *another* 1-5 columns to select our rows. 

However, as we develop more analyses we connect more and more data-points (columns) to each row. Now there is a few ways we could deal with this:

1. Store the data in a column-less format (such as JSON) and re-create the MonetDB database when we need to add or remove columns (since MonetDB is really fast to read data into it, and we have no problem automating this process)
	- This is what we've done so far in our proof of concept
2. Store everything in MonetDB and use Schema alterations as new columns are added (requires altering in a live database)
3. Split the data into multiple tables (one for each analysis) and use JOIN to select them. This makes it not so much one huge table, but several small. Still need schema alterations when doing anything besides adding analysis.
(4. Use a lookup table syntax, where we store three columns: ID, NAME, VALUE and instead of having 10 million rows and 500 columns use 10*500 rows - one row for each data point)
	- Since MonetDB excels at being a columnar storage putting everything into one table with only three columns seems like a very bad fit. Nevertheless it's a though.

Before I spend more time doing proof-of-concepts on scenario 2 and 3 I just wanted to ask: 

Am I missing something? 
Is there any obvious solution that I am not covering?
Is there any way to have "open" columns (i.e dynamic columns instead of fixed schema where the columns are created upon insert)

Again, thank you so much for the work everyone of you is doing on MonetDB. I hope to start contributing as I learn more (probably just adding documentation short-term, possibly committing fixes in the long-term)

Regards,
Niklas

Niklas Bivald

Ying Zhang

tags

participants (2)