
-----Original Message----- From: Dimitar Nedev [mailto:dimitar.nedev@monetdbsolutions.com] Sent: 23. února 2015 16:28 To: Jiří Nádvorník Subject: Re: MonetDB + FITS
On 2015-Feb-23, at 13:28 , Jiří Nádvorník
wrote: Hi,
I will be using this on Debian.
Ok, the ingestion seems quite straight-forward, so to the next point. The coordinates will be ra, dec (right ascension, declination). I have to decide what to do with these "coordinate types".
First option is to use some kind of indexing (if you support any) on these coordinates. Q3C, PgSphere comes in mind for relational DBs. MonetDB does indexing (very lightweight), but currently entirely managed by the engine. In general you can just start querying and not worry about these things, depending on your data size, or at least how it was designed. The first time this data is accessed an index of some sort is created. Some of this indices are persistent, so they will be preserved between server restarts. This means that any following query over the same columns will most likely be faster. If you really want you can also force an index creation, executing a range query (e.g. "SELECT COUNT(*) FROM table WHERE column > value") will create an index on that column. MonetDB already has a module called GEOM, that supports OGS Simple Features for GeoSpatial queries. I am not sure if it currently has the function that need to query a spherical coordinates. The module is already considerably extended and all the changes will be in the next major version (expected in the next few month). This includes a lot more GIS functions and will also include some optimizations for GeoSpatial queries.
Second option is to transform them into [x,y,z] coordinates on a cube and
use these numbers as coordinates for dimensions. You can certainly do that, but if actual data transformation is required, it might be very time consuming. Such queries can very likely be done without data transformation, albeit with a more complex SQL query. Performance might not actually suffer. Now, MonetDB has a module for accessing array-like data called SciQL, but it is still quite experimental, so I don't recommend using it yet.
What would you choose? The queries will be of two kinds.
First one: Select me a chunk of the sky and pass it for further processing.
This will be directly connected with how I want to partition the data (physically) on hard drives.
Second one (not necessarily in the DB, C++ parallel application is maybe a
It will be more or less some Bayesian euclidean clustering of the rows (result will be clusters of points with coordinates close to each other). Running this algorithm will contain searching for neighbors many and many times over. That is the question whether I want to keep this in DB or should I load it to
better option?) - Some more fancy stuff on the result of previous query (chunk of the sky). the memory of a C++ application and search for the neighbors and clusters there. For both kind of queries I would use a database, not only because I work on one, but because the work/research done on database management systems in general have made then very efficient when it comes to accessing data.
For an advanced analysis you can still use a database, and if you stick with MonetDB you can actually do the clustering right in the database. As of this release, we have support for in-database processing using R [1]. You can use an R function to perform Bayesian clustering directly on the data in the system, without having to export it or pass it to another application. Aside from performance benefits, due to reduced amounts of data shuffling, this should also limit the amount of processes you'll need to implement yourself. This approach has been used by researchers - going over large volume of data, looking for events/records in the data that stood out.
This is actually the biggest performance issue. I have read lots of papers where Bayesian approach was used for cca 500 000 objects. We have ATM 4e8 objects and expect to have 6e8 objects in the end. That leads me to the opinion that some kind of parallelization will be needed (the clustering algorithm will have to be running in parallel for chunks of the sky and then some algorithm above that will have to solve the overlapping regions). Can something like that be written in the R module?
My suggestion is to extract a small dataset, prepare your data ingestion procedures and try out the small queries (select me a chunk of the sky). Since you would be working on small amount of data the actual data processing should not take much time. This way you can verify and refine your process. When it comes to the actual queries, I would recommend you to try with simple queries and see if they can do the work, crafting more complex ones if needed.
That must be my approach and I will have to write the whole thing before I can test the actual usefulness on big data, but that's the beauty of this job :) that everything we have done in the past months can be worthless, but surely not the things we learn from it.
I hope this answers some of your questions.
Yes it does, thank you very much for such outgoing communication. Kind regards, Jiri
Best regards, Dimitar
[1] https://www.monetdb.org/content/embedded-r-monetdb
Thank you for your suggestions.
Jiri N.
-----Original Message----- From: Dimitar Nedev [mailto:dimitar.nedev@monetdbsolutions.com] Sent: 23. února 2015 12:52 To: Jiří Nádvorník Subject: Re: MonetDB + FITS
Hi Jiří,
Absolutely, if you have the data in anther database, exporting and importing in MonetDB shouldn't be a problem.
To dump PostgreSQL database to an .sql file you you can use the
"pg_dump"
utility. You can then load it in MonetDB from "mclient". Here's more info how to load a database dump in MonetDB: https://www.monetdb.org/Documentation/UserGuide/DumpRestore These are of course some differences between the two databases' SQL interfaces, which might require some changes to the .sql file, like adapting data types, but PostgreSQL and MonetDB datatypes are similar.
Dumping it to a CSV is also an option and it can eliminate potential SQL incompatibility issues. Loading CSV files also happens through the "mclient". First you'll need to create the table for loading data into. Then basically involves invoking the "COPY INTO" SQL command, specifying the table to load the data into, the location of the file and the delimiters. Here's more info on that:
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/Loading
BulkData
The FIST support was developed in conjunction with astronomers (mostly from ASTRON - the Dutch institute for radio astronomy), the main idea being connivence of data ingestion from large repositories of FITS files. As such, it might pay of using it if you need to load a large volume of FITS files.
By the way, what kind of operating system are you going to use MonetDB on? We are interested what operating system our users have.
Best regards, Dimitar
On 2015-Feb-23, at 12:00 , Jiří Nádvorník
wrote: Hi Dimitar,
This sounds a bit complicated. I have the data already ingested into a postgresql database (in more or less 1 table). Wouldn't it be easier to import it from there? Or from some other simple structure like csv?
I don't really need to work with the files as such - it is just means of holding the data for me.
Kind regads,
Jiri N.
-----Original Message----- From: Dimitar Nedev [mailto:dimitar.nedev@monetdbsolutions.com] Sent: 23. února 2015 11:09 To: Jiří Nádvorník Cc: MonetDB Users Subject: Re: MonetDB + FITS
Hi Jiří,
This might be a bit long, but please bear with me, if you do it once it's relatively simple. To use FIST it currently takes 4 general steps: 1) Build MonetDB with FITS support 2) Create SQL functions for FIST data loading 3) Attach a FIST file for loading 4) Load the data from the file
1) For FITS to work, you must build MonetDB together with the 'cfitsio' library. If you are using OS X, you can installMonetDB via Homebrew, which you can also configure to build the binaries with FITS support: "brew install monetdb --with-cfitsio". If you are building MonetDB on another UNIX/Linux systems, you need to have the cfitsio library present on your path when configuring the build. If its presence is detected, MonetDB will be built with FITS support. More on building MonetDB you can find here: https://www.monetdb.org/Developers/SourceCompile
2) In all releases, including the least (Oct2014-SP2) of MonetDB the FITS SQL functions for data loading are not created automatically, even if FIST support is enabled. This is, sort of, only cosmetic, since the FITS support is there, just not conveniently exposed at SQL. Still, if you are building MonetDB from the latest stable sources, as Homebrew will by default, you will must first define the required SQL functions like this: create procedure fitsattach(fname string) external name fits.attach; create procedure fitsload(tname string) external name fits.load; create procedure listdirpat(dirname string,pat string) external name fits.listdirpattern;
3) Then you can run the 'attach' function to pre-process your file: call fitsattach('/Users/dnedev/monetdb/r564532_cat.fits');
At this point you can already take a glimpse at your (meta-)data Listing the attached files: select * from fits_files; Listing the catalogs in each file: select * from fits_tables; Listing the columns contained in the catalogs: select * from fits_columns;
4) Finally load the data in the FITS, specify separately which catalogs to be loaded, as they are listed in "fits_tables", e.g. : call fitsload('r564532_cat_1'); call fitsload('r564532_cat_2'); call fitsload('r564532_cat_3'); call fitsload('r564532_cat_4');
The data should name be loaded in a table with the same name as the catalog , which you can access like this: select * from "r564532_cat_1";
As of the next MonetDB release, the FITS functions will be created automatically, eliminating the second step. This also means that if you are building MonetDB from sources checked out from our repository, and not using the stable release sources, the functions will be create for you. We are also working on improving the documentation and usability of these functions.
I hope this helps.
Best regards, Dimitar
On 2015-Feb-23, at 09:55 , Jiří Nádvorník
wrote: Hi All,
I have a simple question – how to setup and ingest data into MonetDB from FITS binary tables (FITS extensions)? Namely this will be astronomical objects (1 row = coordinates, magnitude, …).
Thank you very much for info,
Jiri Nadvornik Astronomical Institute AV CR Stellar department Czech Republic nadvornik.ji@gmail.com
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (1)
-
Jiří Nádvorník