[Monetdb-developers] Building accelerators - developers-list - monetdb.org

newer
[Monetdb-developers] Win32 build &...

[Monetdb-developers] Building accelerators

older
Re: [Monetdb-developers]...

Agustin Schapira

10 Apr 2007 10 Apr '07

8:41 p.m.

Hi all, We have created a new Monet data type for one of our projects, and now want to write a set of accelerators for it. Is there any good documentation/examples on how to do that (for 4.16)? Thanks a lot, -- Agustin

Reply

Sign in to reply online Use email software

Show replies by date

Martin Kersten

10 Apr 10 Apr

8:56 p.m.

Hello, Although we originally provided hooks deep in the kernel to enrich it with accelerators, experienced has shown over the years that 1) it wasn't used often, 2) it creates a low-level maintenance headache, and 3) complex datastructures and accelerators can often be modelled using ordinary BAT as representations and judicious use of the BAT operators. For volatile structures, it sometimes pays to create an in-memory structure yourself, enriched with operators of interest, and a scheme to map them to BATs for persistency. This approach is a hybrid approach. The new crackers and database partitioning modules follow this approach. The choice for a C-data structure is often chosen when the programmer is not trained to properly use the GDK (BAT) data structures directly. The MonetDB 5 approach also brings program transformations into view as a scheme to exploit. If you have runtime or application knowledge, you may e.g. replace sequences produced by the SQL compiler with cheaper ones. Or you might even produce plans that embody multiple alternatives, which are chosen at runtime by an application specific scheduler. The src/scheduler/mal_memo.mx illustrates how this might work. (.../Tests/memo*.mal) regards, Martin Agustin Schapira wrote:

Hi all,

We have created a new Monet data type for one of our projects, and now want to write a set of accelerators for it. Is there any good documentation/examples on how to do that (for 4.16)?

Thanks a lot,

-- Agustin

------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Reply

Sign in to reply online Use email software

Agustin Schapira

9:05 p.m.

Thanks for the quick reply, Martin.

The new crackers and database partitioning modules follow this approach. The choice for a C-data structure is often chosen when the programmer is not trained to properly use the GDK (BAT) data structures directly.

Are there any specific examples that I can look at? Where at the new crackers and database partitioning modules? Thanks again, -- A

Reply

Sign in to reply online Use email software

Martin Kersten

11 Apr 11 Apr

5:05 a.m.

Agustin Schapira wrote:

Thanks for the quick reply, Martin.

...
The new crackers and database partitioning modules follow this approach. The choice for a C-data structure is often chosen when the programmer is not trained to properly use the GDK (BAT) data structures directly.

Are there any specific examples that I can look at? Where at the new crackers and database partitioning modules?

Thanks again,

-- A

Both modules are in MonetDB5/src/modules/mal/{bpm.mx,crackers.mx} The former is under active development. It uses an internal structure with import/export to BATs as part of the session startup/exit. The latter is currently subject to reduction and cleanup to improve re-user and maintenance. It uses an AVL tree to keep a catalog of routing information in memory. regards, Martin

Reply

Sign in to reply online Use email software

Agustin Schapira

12:27 p.m.

Both modules are in MonetDB5/src/modules/mal/{bpm.mx,crackers.mx} The former is under active development. It uses an internal structure with import/export to BATs as part of the session startup/exit. The latter is currently subject to reduction and cleanup to improve re-user and maintenance. It uses an AVL tree to keep a catalog of routing information in memory.

Thanks, Martin. We'll take a look at them. Our new datatype is used to store links between objects in the database. It's a simple struct with (link_id, from_id, to_id), which we store in a central BAT. One common operation is, of course, finding the neighbors of a given object, and we would like to be able to do that efficiently, without a full table scan every time. We can build internal data structures every time we do a select on the BAT, but it would be more efficient to have a persistent data structure that we can somehow re-use. We've read the old GIS paper, and our needs are pretty similar to those. Is that code still active, somewhere? Could we take a look at it? Is there any other place where yo do similar things, such as having BATs of a new datatype and searching/joining them efficiently? Too many questions, I know ;-) Thanks! -- Agustin

Reply

Sign in to reply online Use email software

Martin Kersten

12:48 p.m.

...
Both modules are in MonetDB5/src/modules/mal/{bpm.mx,crackers.mx} The former is under active development. It uses an internal structure with import/export to BATs as part of the session startup/exit. The latter is currently subject to reduction and cleanup to improve re-user and maintenance. It uses an AVL tree to keep a catalog of routing information in memory.

Thanks, Martin. We'll take a look at them.

Our new datatype is used to store links between objects in the database. It's a simple struct with (link_id, from_id, to_id), which we store in a hmm, this sounds like a triple store. If the number of link-ids is small enough (<100K), I would seriously consider horizontal fragmentation

central BAT. One common operation is, of course, finding the neighbors of a given object, and we would like to be able to do that efficiently, How volatile is your data? Otherwise, simply sort them on the oid and

Agustin Schapira wrote: based on it as a starting point. Then the automatic hash on the (from_id,to_id) bat gives you the answer quickly. the hash will bring you quickly to the desired plac.

without a full table scan every time. We can build internal data structures every time we do a select on the BAT, but it would be more efficient to have a persistent data structure that we can somehow re-use. This calls for re-use of the BATs as a storage scheme. It also aids in debugging your accelerators code.

We've read the old GIS paper, and our needs are pretty similar to those. Is that code still active, somewhere? Could we take a look at it? Is That code went to the attic many years ago. there any other place where yo do similar things, such as having BATs of a new datatype and searching/joining them efficiently?

Too many questions, I know ;-) Thanks!

-- Agustin

Reply

Sign in to reply online Use email software

Agustin Schapira

1:54 p.m.

hmm, this sounds like a triple store. If the number of link-ids is small enough (<100K), I would seriously consider horizontal fragmentation based on it as a starting point. Then the automatic hash on the (from_id,to_id) bat gives you the answer quickly.

We normally have between 25 to 100 million links... ;-)

How volatile is your data? Otherwise, simply sort them on the oid and the hash will bring you quickly to the desired place.

The data change rarely. So, do you think it'd be good to keep three sorted BATs (one for link_id, one for from, and one for to) and use the appropriate ones? Thanks, -- A

Reply

Sign in to reply online Use email software

6502

Age (days ago)

6503

Last active (days ago)

Download

6 comments

2 participants

tags

participants (2)

Agustin Schapira
Martin Kersten