[MonetDB-users] Open source Column based DB

I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this. Thanks.

On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi, We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql) The other column based system we are looking at is lucidDB. The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL. These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy. I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-) My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data. But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table. Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc. However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong) Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL. So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example; "Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))" or even better, "Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))" or even better better; "Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))" so just have an aggregate function that wraps some time series libraries into aggregate functions. Oh, and having the moon on a stick would be nice as well!!!! Cheers, Tom

InfoBright isn't really columnar, but they might still be worth looking at.
-T
On Fri, Mar 20, 2009 at 5:19 AM, Jos van Dongen
There is one other open source column based DB that might be worth looking at: Infobright Community Edition (http://www.infobright.org/) But if you want the moon on a stick, have a look at Esper, a realtime streaming database that supports the kinds of queries you mentioned in the UDF examples
Good luck, Jos van Dongen
Tom H wrote:
On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

The other thing about Infobright is that the open source version only
supports bulk load and you have to purchase the enterprise edition for
insert/update/delete.
On Fri, Mar 20, 2009 at 7:21 AM, Thomas Briggs
InfoBright isn't really columnar, but they might still be worth looking at.
-T
There is one other open source column based DB that might be worth looking at: Infobright Community Edition (http://www.infobright.org/) But if you want the moon on a stick, have a look at Esper, a realtime streaming database that supports the kinds of queries you mentioned in
On Fri, Mar 20, 2009 at 5:19 AM, Jos van Dongen
wrote: the UDF examples
Good luck, Jos van Dongen
Tom H wrote:
On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

I see. So it seems, open source column database wise, we're left with just
LucidDB and MonetDB?
On Fri, Mar 20, 2009 at 6:21 AM, Thomas Briggs
InfoBright isn't really columnar, but they might still be worth looking at.
-T
There is one other open source column based DB that might be worth looking at: Infobright Community Edition (http://www.infobright.org/) But if you want the moon on a stick, have a look at Esper, a realtime streaming database that supports the kinds of queries you mentioned in
On Fri, Mar 20, 2009 at 5:19 AM, Jos van Dongen
wrote: the UDF examples
Good luck, Jos van Dongen
Tom H wrote:
On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Those are the only two I'm aware of, yes.
On Fri, Mar 20, 2009 at 12:26 PM, Yue Sheng
I see. So it seems, open source column database wise, we're left with just LucidDB and MonetDB?
On Fri, Mar 20, 2009 at 6:21 AM, Thomas Briggs
wrote: InfoBright isn't really columnar, but they might still be worth looking at.
-T
On Fri, Mar 20, 2009 at 5:19 AM, Jos van Dongen
wrote: There is one other open source column based DB that might be worth looking at: Infobright Community Edition (http://www.infobright.org/) But if you want the moon on a stick, have a look at Esper, a realtime streaming database that supports the kinds of queries you mentioned in the UDF examples
Good luck, Jos van Dongen
Tom H wrote:
On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Esper seems to be an event stream engine that processes live feed
effectively. I'm not clear, at least from reading their website, that I can
effectively handle query on historical data. It seems the historical data
part is taken care of by another database.
It is the historical data access part that is most important to me at the
moment.
Thanks.
On Fri, Mar 20, 2009 at 2:19 AM, Jos van Dongen
There is one other open source column based DB that might be worth looking at: Infobright Community Edition (http://www.infobright.org/)http://www.infobright.org/ But if you want the moon on a stick, have a look at Esperhttp://www.espertech.com/products/esper.php, a realtime streaming database that supports the kinds of queries you mentioned in the UDF examples
Good luck, Jos van Dongen
Tom H wrote:
On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid -http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing listMonetDB-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Nice posting. My needs are a bit simpler than yours. I just need a data
repository so I can perform fast queries. The typical query would be
"give me the total traded volume of IBM between 10:05am and 10:09am"
"do that for every stock on the NYSE"
The data I have is the tick-by-tick data. So, they could run into billions
of row.
Those two queries are just a rough example. There are other "joining" that I
need to do.
I use mainly use R for my analysis.
I have also looked into MySQL, Lucid and MonetDB. From what I'm gathered,
MySQL is not suited for time series data; LucidDB is up to 40times slower
than MonetDB. So I've been experimenting with Monet for the last two weeks.
Observations (see separate thread on COPYing >300millions rows):
(1) I have no problem compiling that source in 64bit on the Mac OSX Leopard
(2) BUT, it is nigh impossible to load a simple 700million-by-10 numeric
dataset into the database
(3) It kept hanging after a certain amount has been processed. I've tried
mclient, sql, and JDBC - same problem.
This explains my looking into alternatives.
The other commercial ones that I've heard are Kx and FAME. Anyone know any
sight on those?
On Thu, Mar 19, 2009 at 7:14 PM, Tom H
On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Where does on get tick-by-tick data, out of curiosity?
On Fri, Mar 20, 2009 at 12:16 PM, Yue Sheng
Nice posting. My needs are a bit simpler than yours. I just need a data repository so I can perform fast queries. The typical query would be "give me the total traded volume of IBM between 10:05am and 10:09am" "do that for every stock on the NYSE" The data I have is the tick-by-tick data. So, they could run into billions of row. Those two queries are just a rough example. There are other "joining" that I need to do. I use mainly use R for my analysis. I have also looked into MySQL, Lucid and MonetDB. From what I'm gathered, MySQL is not suited for time series data; LucidDB is up to 40times slower than MonetDB. So I've been experimenting with Monet for the last two weeks. Observations (see separate thread on COPYing >300millions rows): (1) I have no problem compiling that source in 64bit on the Mac OSX Leopard (2) BUT, it is nigh impossible to load a simple 700million-by-10 numeric dataset into the database (3) It kept hanging after a certain amount has been processed. I've tried mclient, sql, and JDBC - same problem. This explains my looking into alternatives. The other commercial ones that I've heard are Kx and FAME. Anyone know any sight on those?
On Thu, Mar 19, 2009 at 7:14 PM, Tom H
wrote: On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

For people in academia, we use CRSP
On Fri, Mar 20, 2009 at 9:25 AM, Thomas Briggs
Where does on get tick-by-tick data, out of curiosity?
Nice posting. My needs are a bit simpler than yours. I just need a data repository so I can perform fast queries. The typical query would be "give me the total traded volume of IBM between 10:05am and 10:09am" "do that for every stock on the NYSE" The data I have is the tick-by-tick data. So, they could run into billions of row. Those two queries are just a rough example. There are other "joining"
On Fri, Mar 20, 2009 at 12:16 PM, Yue Sheng
wrote: that I need to do. I use mainly use R for my analysis. I have also looked into MySQL, Lucid and MonetDB. From what I'm gathered, MySQL is not suited for time series data; LucidDB is up to 40times slower than MonetDB. So I've been experimenting with Monet for the last two weeks. Observations (see separate thread on COPYing >300millions rows): (1) I have no problem compiling that source in 64bit on the Mac OSX Leopard (2) BUT, it is nigh impossible to load a simple 700million-by-10 numeric dataset into the database (3) It kept hanging after a certain amount has been processed. I've tried mclient, sql, and JDBC - same problem. This explains my looking into alternatives. The other commercial ones that I've heard are Kx and FAME. Anyone know any sight on those?
On Thu, Mar 19, 2009 at 7:14 PM, Tom H
wrote: On Thu, 2009-03-19 at 18:04 -0700, Yue Sheng wrote:
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this.
Hi,
We are currently demonstrating prototypes of our system, which is a web based interface similar to igmarkets and CMC markets, that relies on running time series queries against 3 different backends that we are currently testing. (monet, lucid and mysql)
The other column based system we are looking at is lucidDB.
The current idea is to go for performance from monet or lucid - http://www.luciddb.org/ or alternatively write some time series custom functions for MySQL.
These guys apparently have the financial time series database market nailed; http://kx.com/ but they are super expensive to deploy.
I guess I should point out that I don't have a db or CS background, I work as a tech consultant from a startup incubations perspective, so forgive me (or indeed correct me where I am wrong...;-)
My understanding is that performance wise, if you are going to run your analysis of your time series on the app/client side of the database system, then for technical analysis types of problems, performance is going to be achieved by getting a time series into a client side data structure as efficiently/quickly as possible for the app to use. And monetDB seems to perform very well at that for columns of data.
But that means that a strategy that wants to compare 200day moving averages across every time series is going to have to get all that data every time it runs the query, or cache it and summarise back to a store of aggregate data which has to be written back to a table.
Obviously you can use relevant aggregate SQL queries to improve the performance, ie SUM(), COUNT(), MAX() etc.
However ideally, we just want to plug ta-lib or some other technical analysis lib straight into the aggregate functions of the database. For example at the moment to do a query like; show me all the stocks that have declined 40% in 6 months, it would be a join like "select symbol from stocks s1 join stocks s2 on s1.symbol = s2.symbol where s1.date = NOW() and s2.date = date_sub(Now(), interval 6 months) and (s1.price < (.6 * s2.price)) (I just run that one off my head from what i remember about mysql, its probably wrong)
Anyway, but to do something like compare all stocks against bollinger bands, or Doji you are going to have to pull down all the data for the time series you want to compare, or indeed re-implement bollinger or some technical indicator from ta-lib in SQL.
So my point is, that it would be cool to have a extendible UDF system that would allow us to plugin a ta-lib library into user defined functions for example;
"Select symbol from stocks group by symbol having Today(price) < (.6 (DateAgo( interval 6 months, price ) ))"
or even better,
"Select symbol from stocks group by symbol having Today(MovingAverage(price)) < (.6 (DateAgo( interval 6 months, MovingAverage(price,50 days) ) ))"
or even better better;
"Select symbol from stocks group by symbol having Today(CallTaLib("Chande Momentum Oscillator",price)) < (.6 (DateAgo( interval 6 months, CallTAlib("Chande Momentum Oscillator",price,50 days) ) ))"
so just have an aggregate function that wraps some time series libraries into aggregate functions.
Oh, and having the moon on a stick would be nice as well!!!!
Cheers,
Tom
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Have you tried FastBit? http://sdm.lbl.gov/fastbit/ Andi. -----Original Message----- From: Yue Sheng [mailto:yuesheng8@gmail.com] Sent: Thu 3/19/2009 9:04 PM To: Communication channel for MonetDB users Subject: [MonetDB-users] Open source Column based DB I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this. Thanks.

No I haven't. Just had a look at the link. Is it an add-on for database? I
don't see anywhere that shows FastBit being a stand alone database product.
On Thu, Mar 19, 2009 at 7:44 PM, Broka, Andi
Have you tried FastBit?
Andi.
-----Original Message----- From: Yue Sheng [mailto:yuesheng8@gmail.com] Sent: Thu 3/19/2009 9:04 PM To: Communication channel for MonetDB users Subject: [MonetDB-users] Open source Column based DB
I was wondering if anyone know of other column-based DBs suited to vase amount of time series data? Tried Googling. It'd be nice to hear from someone who has any insight on this. Thanks.
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (6)
-
Broka, Andi
-
Jos van Dongen
-
Ross Bates
-
Thomas Briggs
-
Tom H
-
Yue Sheng