[Monetdb-developers] Building accelerators
Hi all, We have created a new Monet data type for one of our projects, and now want to write a set of accelerators for it. Is there any good documentation/examples on how to do that (for 4.16)? Thanks a lot, -- Agustin
Hello, Although we originally provided hooks deep in the kernel to enrich it with accelerators, experienced has shown over the years that 1) it wasn't used often, 2) it creates a low-level maintenance headache, and 3) complex datastructures and accelerators can often be modelled using ordinary BAT as representations and judicious use of the BAT operators. For volatile structures, it sometimes pays to create an in-memory structure yourself, enriched with operators of interest, and a scheme to map them to BATs for persistency. This approach is a hybrid approach. The new crackers and database partitioning modules follow this approach. The choice for a C-data structure is often chosen when the programmer is not trained to properly use the GDK (BAT) data structures directly. The MonetDB 5 approach also brings program transformations into view as a scheme to exploit. If you have runtime or application knowledge, you may e.g. replace sequences produced by the SQL compiler with cheaper ones. Or you might even produce plans that embody multiple alternatives, which are chosen at runtime by an application specific scheduler. The src/scheduler/mal_memo.mx illustrates how this might work. (.../Tests/memo*.mal) regards, Martin Agustin Schapira wrote:
Hi all,
We have created a new Monet data type for one of our projects, and now want to write a set of accelerators for it. Is there any good documentation/examples on how to do that (for 4.16)?
Thanks a lot,
-- Agustin
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Thanks for the quick reply, Martin.
The new crackers and database partitioning modules follow this approach. The choice for a C-data structure is often chosen when the programmer is not trained to properly use the GDK (BAT) data structures directly.
Are there any specific examples that I can look at? Where at the new crackers and database partitioning modules? Thanks again, -- A
Agustin Schapira wrote:
Thanks for the quick reply, Martin.
The new crackers and database partitioning modules follow this approach. The choice for a C-data structure is often chosen when the programmer is not trained to properly use the GDK (BAT) data structures directly.
Are there any specific examples that I can look at? Where at the new crackers and database partitioning modules?
Thanks again,
-- A
Both modules are in MonetDB5/src/modules/mal/{bpm.mx,crackers.mx} The former is under active development. It uses an internal structure with import/export to BATs as part of the session startup/exit. The latter is currently subject to reduction and cleanup to improve re-user and maintenance. It uses an AVL tree to keep a catalog of routing information in memory. regards, Martin
Both modules are in MonetDB5/src/modules/mal/{bpm.mx,crackers.mx} The former is under active development. It uses an internal structure with import/export to BATs as part of the session startup/exit. The latter is currently subject to reduction and cleanup to improve re-user and maintenance. It uses an AVL tree to keep a catalog of routing information in memory.
Thanks, Martin. We'll take a look at them. Our new datatype is used to store links between objects in the database. It's a simple struct with (link_id, from_id, to_id), which we store in a central BAT. One common operation is, of course, finding the neighbors of a given object, and we would like to be able to do that efficiently, without a full table scan every time. We can build internal data structures every time we do a select on the BAT, but it would be more efficient to have a persistent data structure that we can somehow re-use. We've read the old GIS paper, and our needs are pretty similar to those. Is that code still active, somewhere? Could we take a look at it? Is there any other place where yo do similar things, such as having BATs of a new datatype and searching/joining them efficiently? Too many questions, I know ;-) Thanks! -- Agustin
Both modules are in MonetDB5/src/modules/mal/{bpm.mx,crackers.mx} The former is under active development. It uses an internal structure with import/export to BATs as part of the session startup/exit. The latter is currently subject to reduction and cleanup to improve re-user and maintenance. It uses an AVL tree to keep a catalog of routing information in memory.
Thanks, Martin. We'll take a look at them.
Our new datatype is used to store links between objects in the database. It's a simple struct with (link_id, from_id, to_id), which we store in a hmm, this sounds like a triple store. If the number of link-ids is small enough (<100K), I would seriously consider horizontal fragmentation
central BAT. One common operation is, of course, finding the neighbors of a given object, and we would like to be able to do that efficiently, How volatile is your data? Otherwise, simply sort them on the oid and
Agustin Schapira wrote: based on it as a starting point. Then the automatic hash on the (from_id,to_id) bat gives you the answer quickly. the hash will bring you quickly to the desired plac.
without a full table scan every time. We can build internal data structures every time we do a select on the BAT, but it would be more efficient to have a persistent data structure that we can somehow re-use. This calls for re-use of the BATs as a storage scheme. It also aids in debugging your accelerators code.
We've read the old GIS paper, and our needs are pretty similar to those. Is that code still active, somewhere? Could we take a look at it? Is That code went to the attic many years ago. there any other place where yo do similar things, such as having BATs of a new datatype and searching/joining them efficiently?
Too many questions, I know ;-) Thanks!
-- Agustin
hmm, this sounds like a triple store. If the number of link-ids is small enough (<100K), I would seriously consider horizontal fragmentation based on it as a starting point. Then the automatic hash on the (from_id,to_id) bat gives you the answer quickly.
We normally have between 25 to 100 million links... ;-)
How volatile is your data? Otherwise, simply sort them on the oid and the hash will bring you quickly to the desired place.
The data change rarely. So, do you think it'd be good to keep three sorted BATs (one for link_id, one for from, and one for to) and use the appropriate ones? Thanks, -- A
participants (2)
-
Agustin Schapira
-
Martin Kersten