[MonetDB-users] General Questions on Columnar DB Design and Querying

13 Nov 2011

      Hi all,

I've managed to get MonetDB running and loaded with a chunk of data
(thanks to some great help from the list) and am now trying to figure
out the "right" way to make use of it.  I don't have a whole lot of
exposure to columnar stores, and am therefore not sure if I'm doing
things in an idiomatically correct way.  I have a couple general
questions about columnar DBs, and hope its appropriate to ask them
here even thought they aren't specific to MonetDB.  If these questions
aren't considered appropriate for this list I apologize in advance and
will happily move them to a different forum (but would appreciate
pointers to where such a forum might be.)

Two questions are general, and one is fairly specific.  First the general:

1. Does working with a columnar store fundamentally change the way one
would structure the tables?  I'm somewhat familiar with star-schema
designs where a central fact table holds the entire attribute set in
as rows of indexes with supporting lookup tables for each column type,
but it seems like columnar stores achieve the same thing without
actually requiring the developer to specifically build those
structures.  Tables are constructed in a de-normalized RDBMS manner
and the underlying DB takes care of the vectorization.  Is this a
valid description, or am I fundamentally misunderstanding the basics?

2. If there are significant differences in the way columnar stores
should be designed, can folks point me to any resources (web, book,
article, etc.)  to help educate me on the topic?

Now the specific:

I have several sets of time-series data modeled as follows:

create table sensordata (
     sensor         varchar(32),
     epoch_ts  bigint,
     reading     double,
     CONSTRAINT SENSORDATA_PKEY PRIMARY KEY (sensor, epoch_ts)
);

I've populated that data, and now want to query data for a subset of
sensors and generate a table that shows the value of the sensors at
each point in time.  I'm not sure what the right way to do this is.
My desired out:ut would be:

epoch_ts     sensor1     sensor2     delta
--------------    ------------     ------------     ------
<time1>      23.5           24.1          0.6
<time2>      19.3           25.2          5.9
...
<timeN>      25.1           19.3          -5.8

I tried to build that query as follows but end up with NULL results
(note: I'm in no way a SQL pro so it might be really ugly.)

select a.epoch_ts as epoch,
      a.reading as sensor1_reading,
      b.reading as sensor2_reading,
      b.reading - a.reading as delta
from
    (select epoch_ts, reading from sensordata where sensor='sensor1') a,
    (select epoch_ts, reading from sensordata where sensor='sensor2') b
where b.epoch_ts = a.epoch_ts
order by epoch asc;

Is this a sane way to approach the problem, or am I completely off
track here.  Each subselect returns results, but they obviously aren't
joining correctly and it feels like it might simply be the wrong way
to try to build the result set.

I appreciate any direction you might provide.

Thanks!

Sean McNamara

Stefan Manegold

tags

participants (2)