RE: MonetDB + FITS

23 Feb 2015

...
-----Original Message-----
From: Dimitar Nedev [mailto:dimitar.nedev@monetdbsolutions.com]
Sent: 23. února 2015 16:28
To: Jiří Nádvorník
Subject: Re: MonetDB + FITS
...
On 2015-Feb-23, at 13:28 , Jiří Nádvorník  wrote:
Hi,
I will be using this on Debian.
Ok, the ingestion seems quite straight-forward, so to the next point. The
coordinates will be ra, dec (right ascension, declination).  I have to decide
what to do with these "coordinate types".
First option is to use some kind of indexing (if you support any) on these
coordinates. Q3C, PgSphere comes in mind for relational DBs.
MonetDB does indexing (very lightweight), but currently entirely managed
by the engine. In general you can just start querying and not worry about
these things, depending on your data size, or at least how it was designed.
The first time this data is accessed an index of some sort is created. Some of
this indices are persistent, so they will be preserved between server restarts.
This means that any following query over the same columns will most likely
be faster. If you really want you can also force an index creation, executing a
range query (e.g. "SELECT COUNT(*) FROM table WHERE column > value")
will create an index on that column.
MonetDB already has a module called GEOM, that supports OGS Simple
Features for GeoSpatial queries. I am not sure if it currently has the function
that need to query a spherical coordinates. The module is already
considerably extended  and all the changes will be in the next major version
(expected in the next few month). This includes a lot more GIS functions and
will also include some optimizations for GeoSpatial queries.
...
Second option is to transform them into [x,y,z] coordinates on a cube and
use these numbers as coordinates for dimensions.
You can certainly do that, but if actual data transformation is required, it
might be very time consuming. Such queries can very likely be done without
data transformation, albeit with a more complex SQL query. Performance
might not actually suffer.
Now, MonetDB has a module for accessing array-like data called SciQL, but it
is still quite experimental, so I don't recommend using it yet.
...
What would you choose? The queries will be of two kinds.
First one: Select me a chunk of the sky and pass it for further processing.
This will be directly connected with how I want to partition the data
(physically) on hard drives.
...
Second one (not necessarily in the DB, C++ parallel application is maybe a
...
It will be more or less some Bayesian euclidean clustering of the rows
(result will be clusters of points with coordinates close to each other).
Running this algorithm will contain searching for neighbors many and many
times over.
That is the question whether I want to keep this in DB or should I load it to
better option?) - Some more fancy stuff on the result of previous query
(chunk of the sky).
the memory of a C++ application and search for the neighbors and clusters
there.
For both kind of queries I would use a database, not only because I work on
one, but because the work/research done on database management
systems in general have made then very efficient when it comes to accessing
data.
For an advanced analysis you can still use a database, and if you stick with
MonetDB you can actually do the clustering right in the database. As of this
release, we have support for in-database processing using R [1]. You can use
an R function to perform Bayesian clustering directly on the data in the
system, without having to export it or pass it to another application. Aside
from performance benefits, due to reduced amounts of data shuffling, this
should also limit the amount of processes you'll need to implement yourself.
This approach has been used by researchers - going over large volume of
data, looking for events/records in the data that stood out.
This is actually the biggest performance issue. I have read lots of papers where Bayesian approach was used for cca 500 000 objects. We have ATM 4e8 objects and expect to have 6e8 objects in the end. That leads me to the opinion that some kind of parallelization will be needed (the clustering algorithm will have to be running in parallel for chunks of the sky and then some algorithm above that will have to solve the overlapping regions). 

Can something like that be written in the R module?
...
My suggestion is to extract a small dataset, prepare your data ingestion
procedures and try out the small queries (select me a chunk of the sky).
Since you would be working on small amount of data the actual data
processing should not take much time. This way you can verify and refine
your process. When it comes to the actual queries, I would recommend you
to try with simple queries and see if they can do the work, crafting more
complex ones if needed.
That must be my approach and I will have to write the whole thing before I can test the actual usefulness on big data, but that's the beauty of this job :) that everything we have done in the past months can be worthless, but surely not the things we learn from it.
...
I hope this answers some of your questions.
Yes it does, thank you very much for such outgoing communication.

Kind regards,
Jiri
...
Best regards,
Dimitar
[1] https://www.monetdb.org/content/embedded-r-monetdb
...
Thank you for your suggestions.
Jiri N.
...
-----Original Message-----
From: Dimitar Nedev [mailto:dimitar.nedev@monetdbsolutions.com]
Sent: 23. února 2015 12:52
To: Jiří Nádvorník
Subject: Re: MonetDB + FITS
Hi Jiří,
Absolutely, if you have the data in anther database, exporting and
importing in MonetDB shouldn't be a problem.
To dump PostgreSQL database to an .sql file you you can use the
"pg_dump"
...
...
utility. You can then load it in MonetDB from "mclient". Here's more
info how to load a database dump in MonetDB:
https://www.monetdb.org/Documentation/UserGuide/DumpRestore
These are of course some differences between the two databases' SQL
interfaces, which might require some changes to the .sql file, like
adapting data types, but PostgreSQL and MonetDB datatypes are similar.
Dumping it to a CSV is also an option and it can eliminate potential
SQL incompatibility issues. Loading CSV files also happens through the
"mclient".
First you'll need to create the table for loading data into. Then
basically involves invoking the "COPY INTO" SQL command, specifying
the table to load the data into, the location of the file and the
delimiters. Here's more info on that:
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/Loading
...
BulkData
The FIST support was developed in conjunction with astronomers
(mostly from ASTRON - the Dutch institute for radio astronomy), the
main idea being connivence of data ingestion from large repositories
of FITS files. As such, it might pay of using it if you need to load a large
volume of FITS files.
By the way, what kind of operating system are you going to use
MonetDB on? We are interested what operating system our users have.
Best regards,
Dimitar
...
On 2015-Feb-23, at 12:00 , Jiří Nádvorník 
wrote:
Hi Dimitar,
This sounds a bit complicated. I have the data already ingested into
a
postgresql database (in more or less 1 table). Wouldn't it be easier
to import it from there? Or from some other simple structure like csv?
I don't really need to work with the files as such - it is just
means of holding
the data for me.
Kind regads,
Jiri N.
...
-----Original Message-----
From: Dimitar Nedev [mailto:dimitar.nedev@monetdbsolutions.com]
Sent: 23. února 2015 11:09
To: Jiří Nádvorník
Cc: MonetDB Users
Subject: Re: MonetDB + FITS
Hi Jiří,
This might be a bit long, but please bear with me, if you do it
once it's relatively simple.
To use FIST it currently takes 4 general steps:
1) Build MonetDB with FITS support
2) Create SQL functions for FIST data loading
3) Attach a FIST file for loading
4) Load the data from the file
1) For FITS to work, you must build MonetDB together with the 'cfitsio'
library. If you are using OS X, you can installMonetDB via
Homebrew, which you can also configure to build the binaries with FITS
support:
"brew install monetdb --with-cfitsio". If you are building MonetDB
on another UNIX/Linux systems, you need to have the cfitsio library
present on your path when configuring the build. If its presence is
detected, MonetDB will be built with FITS support. More on building
MonetDB you can find here:
https://www.monetdb.org/Developers/SourceCompile
2) In all releases, including the least  (Oct2014-SP2) of MonetDB
the FITS SQL functions for data loading are not created
automatically, even if FIST support is enabled. This is, sort of,
only cosmetic, since the FITS support is there, just not
conveniently exposed at SQL. Still, if you are building MonetDB
from the latest stable sources, as Homebrew will by default, you
will must first define the
required SQL functions like this:
  create procedure fitsattach(fname string) external name fits.attach;
  create procedure fitsload(tname string) external name fits.load;
  create procedure listdirpat(dirname string,pat string) external
name fits.listdirpattern;
3) Then you can run the 'attach' function to pre-process your file:
  call fitsattach('/Users/dnedev/monetdb/r564532_cat.fits');
At this point you can already take a glimpse at your (meta-)data
Listing the attached files:
  select * from fits_files;
Listing the catalogs in each file:
  select * from fits_tables;
Listing the columns contained in the catalogs:
  select * from fits_columns;
4) Finally load the data in the FITS, specify separately which
catalogs to be loaded, as they are listed in "fits_tables", e.g. :
  call fitsload('r564532_cat_1');
  call fitsload('r564532_cat_2');
  call fitsload('r564532_cat_3');
  call fitsload('r564532_cat_4');
The data should name be loaded in a table with the same name as the
catalog , which you can access like this:
  select * from "r564532_cat_1";
As of the next MonetDB release, the FITS functions will be created
automatically, eliminating the second step. This also means that if
you are building MonetDB from sources checked out from our
repository, and not using the stable release sources, the functions
will be
create for you.
We are also working on improving the documentation and usability of
these functions.
I hope this helps.
Best regards,
Dimitar
...
On 2015-Feb-23, at 09:55 , Jiří Nádvorník 
wrote:
Hi All,
I have a simple question – how to setup and ingest data into
MonetDB
from FITS binary tables (FITS extensions)? Namely this will be
astronomical objects (1 row = coordinates, magnitude, …).
Thank you very much for info,
Jiri Nadvornik
Astronomical Institute AV CR
Stellar department
Czech Republic
nadvornik.ji@gmail.com
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

Jiří Nádvorník

tags

participants (1)