Hi Miguel,
the solution to this problem would be to sort your edges on head node
ident so you can have binary search (log(n)) when you select. Monet
will automatically see that the bat is sorted and thus will use binary
search. In addition you have to sort your nodes table on node ident to
have binary search there too, and if you want you can make a copy of
edges table to have a second order on tail for backtracking etc.
Don't forget that in monet you have to keep the heads aligned and you
have to do that manually since you are in the MAL level. After
ordering head node edge, you have to align the edge id and tail node
id with the order imposed by the head node id.
good luck,
lefteris
2010/4/21 Miguel Ángel Águila
Hello,
I'm working in a research problem and we need to create a graph database. We think that MonetDB can be useful to represent this graph since the use of bats may make the graph traversal efficient. Nevertheless, SQL is limited for the type of operations that we need to do, and that is why we have decided to resort to MAL.
Our database is like this:
- Nodes: a) node ident b) name - Edges: a) edge ident b) head node ident c) tail node ident
One of the operations is to find the node with maximum out degree and, after that, to do a BFS using this node as the source.
I have stored all the information in bats:
batNode1=[oid,node ident] batNode2=[oid,name] batEdge=[oid, edge ident] batEdgeHead=[oid, head node ident] batEdgeTail=[oid,tail node ident]
For getting the node with the maximum out degree, I access batEdgeHead and store the number of occurrences of every node id in a batAux.
batAux=[node ident, times]
After that I only select the node id with the largest number of occurrences.
Now, we are ready to do the BFS. In every iteration, I have to save all the tail nodes for exploring them in the next generation. This is the pseudocode explanation:
Put the node id to NodesToVisit While there are NodesToVisit currentNode=NodesToVisit; edgeToVisit=get edges idents from batEdgeHead from currentNode ident While there are edgesToVisit currentEdge=edgesToVisit nodeFinal=get tail node ident from batEdgeTail from currentEdge ident If notExist nodeFinal in NodesToVisit put nodeFinal in NodesToVisit endIf endWhile endWhile
The main problem happens when I perform a select from batEdgeHead from the ident node. batEdgeHead might become very large and accessing it might be very expensive in terms of time. Is there any kind of indexing? I tried with crackers and AVL index but, if I am not wrong, this is only useful if I had to do the same selection more than once. The number of the nodes is 325.000 and the number of edges 925.000. At the end of this e-mail you will find the code.
May any one help in improving the performance of this code?
Best regards,
Miquel Àngel Àguila
Bat equivalence: batEdge, batNode1 --> b1:= bat.new(:oid,:lng); batNode2 --> b2:= bat.new(:oid,:str); batEdgeHead --> rh:= bat.new(:oid,:lng); batEdgeTail --> rt:= bat.new(:oid,:lng); batType --> b4:= bat.new(:oid, int) (Type: 1. nodeType1, 2. nodeType2 , 3. edgeType1, 4. edgeType2, 5. edgeType3)
Loaded the bats, here is the code:
Maxium out degree:----------------------------------------------------- edart:=algebra.select(b4,3); nodOut:=algebra.semijoin(rh,edart); res:=bat.new(:lng,:int); valor:=0; barrier (mloop16:lng ,h16:oid ,tl16:lng ) := bat.newIterator(nodOut); bite1:=algebra.exist(res,tl16); barrier ifpart:= bite1; valor:=algebra.find(res, tl16); valor:=calc.+(valor,1); res:=bat.replace(res,tl16,valor); exit ifpart; barrier elsepart:= calc.!=(bite1,true); bat.insert(res,tl16,1); exit elsepart; redo (mloop16:lng ,h16:oid ,tl16:lng ) := bat.hasMoreElements(nodOut); exit mloop16:lng ;
maxOutDegree:=aggr.max(res); node:=algebra.select(res,maxOutDegree); io.print(node); nodart:=algebra.select(b4,1); nodart2:=algebra.semijoin(b1,nodart); ident2:=algebra.join(nodart2,node); name2:=algebra.semijoin(b3,ident2); secondtime:=alarm.time(); #Traversals------------------------------------------------------------ node2:=algebra.semijoin(b1,name2); aux:=0:lng; idoid:=0@0; barrier (mloop17:lng ,h17:oid ,tl17:lng) := bat.newIterator(node2); aux:=tl17; idoid:=h17; exit mloop17:lng ; nod:=bat.new(:lng,:lng); dif:=bat.new(:lng,:lng); vis:=bat.new(:lng,:lng); ar:=bat.new(:oid,:oid); tr:=bat.new(:oid,:oid); bat.insert(nod,aux,aux); dif:=algebra.difference(nod,vis); outer6:= aggr.count(dif); rhtaux:=bat.new(:oid,:int); rhaux:=bat.new(:oid,:lng); rtaux:=bat.new(:oid,:lng); rhtaux:=algebra.select(b4,3); rtaux:=algebra.semijoin(rt,rhtaux); rhaux:=algebra.semijoin(rh,rhtaux); barrier outer:=true; barrier (mloop18:lng ,h18:lng ,tl18:lng ) := bat.newIterator(dif); # This is the problem, I need to do a lot of selects rhaux2:=algebra.select(rhaux,tl18); rtaux2:=algebra.semijoin(rtaux,rhaux2); barrier (mloop19:lng ,h19:oid,tl19:lng ) := bat.newIterator(rtaux2); bite1:=algebra.exist(nod,tl19); barrier ifpart:= bite1; bat.insert(tr,h19,h19); exit ifpart; barrier elsepart:= calc.!=(bite1,true); bat.insert(ar,h19,h19); bat.insert(nod,tl19,tl19); exit elsepart; redo (mloop19:lng ,h19:oid ,tl19:lng ) := bat.hasMoreElements(rtaux2); exit mloop19:lng ; bat.insert(vis,h18,tl18); redo (mloop18:lng ,h18:lng ,tl18:lng ) := bat.hasMoreElements(dif); exit mloop18:lng ; dif:=algebra.difference(nod,vis); outer6:= aggr.count(dif); outer:=calc.!=(outer6,0); redo outer:=calc.!=(outer6,0); exit outer;
------------------------------------------------------------------------------ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users