Distributed Join and Replica Tables
Can someone describe how exactly replica tables are used to push joins to worker nodes? Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node. Is that the general strategy? Thanks, -K
On 5 May 2017, at 23:07, Kelly Burkhart
wrote: Can someone describe how exactly replica tables are used to push joins to worker nodes?
Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.
Hai Kelly, This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes. But how exactly replica tables are used, and joins are handled largely depends on your query. Best, Jennie
Is that the general strategy?
Thanks,
-K _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Can someone please provide more concrete information on this? Is it known
to work?
Say I have a master and two worker nodes.
I have a usr table on master and remote versions of this table on each
worker:
sql>select * from usr;
+------+-------+
| id | name |
+======+=======+
| 1 | kelly |
| 2 | fred |
| 3 | bob |
| 4 | igor |
+------+-------+
4 tuples (3.239ms)
I have an activity table that is sharded across the two workers, each
remoted to master and a merge table (called activity) combining them. On
worker 0:
sql>select * from activity_w0;
+--------+------+
| usr_id | what |
+========+======+
| 1 | act1 |
| 3 | act3 |
| 1 | act5 |
| 3 | act8 |
+--------+------+
4 tuples (2.331ms)
On worker 1:
sql>select * from activity_w1;
+--------+------+
| usr_id | what |
+========+======+
| 2 | act2 |
| 4 | act4 |
| 2 | act6 |
| 4 | act7 |
+--------+------+
4 tuples (3.899ms)
So worker 0 has only activity for odd usr ids, worker 1 has only activity
for even worker ids. If I perform this query on master:
sql>select
more> what
more>from
more> activity a
more> join usr u on (a.usr_id = u.id)
more>where
more> u.name = 'kelly';
+------+
| what |
+======+
| act1 |
| act5 |
+------+
2 tuples (10.822ms)
I assume that master is pulling back the entire contents of both
activity_w0 and activity_w1 because it doesn't know that there is a remote
usr table on each worker. It is not clear to me how to set up a replica
table on the workers to allow the filtering to be done there rather than at
the master.
Is this possible or am I chasing a red herring?
-K
On Mon, May 8, 2017 at 6:31 AM, Ying Zhang
On 5 May 2017, at 23:07, Kelly Burkhart
wrote: Can someone describe how exactly replica tables are used to push joins to worker nodes?
Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.
Hai Kelly,
This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes.
But how exactly replica tables are used, and joins are handled largely depends on your query.
Best, Jennie
Is that the general strategy?
Thanks,
-K _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Kelly,
Maybe you should create a view on the worker and reference it from the
master i am not sure if this works as i have never tried it as of yet.
Let me know of your findings.
Regards.
Brian Hood
On Mon, May 8, 2017 at 5:09 PM, Kelly Burkhart
Can someone please provide more concrete information on this? Is it known to work?
Say I have a master and two worker nodes.
I have a usr table on master and remote versions of this table on each worker:
sql>select * from usr; +------+-------+ | id | name | +======+=======+ | 1 | kelly | | 2 | fred | | 3 | bob | | 4 | igor | +------+-------+ 4 tuples (3.239ms)
I have an activity table that is sharded across the two workers, each remoted to master and a merge table (called activity) combining them. On worker 0:
sql>select * from activity_w0; +--------+------+ | usr_id | what | +========+======+ | 1 | act1 | | 3 | act3 | | 1 | act5 | | 3 | act8 | +--------+------+ 4 tuples (2.331ms)
On worker 1:
sql>select * from activity_w1; +--------+------+ | usr_id | what | +========+======+ | 2 | act2 | | 4 | act4 | | 2 | act6 | | 4 | act7 | +--------+------+ 4 tuples (3.899ms)
So worker 0 has only activity for odd usr ids, worker 1 has only activity for even worker ids. If I perform this query on master:
sql>select more> what more>from more> activity a more> join usr u on (a.usr_id = u.id) more>where more> u.name = 'kelly'; +------+ | what | +======+ | act1 | | act5 | +------+ 2 tuples (10.822ms)
I assume that master is pulling back the entire contents of both activity_w0 and activity_w1 because it doesn't know that there is a remote usr table on each worker. It is not clear to me how to set up a replica table on the workers to allow the filtering to be done there rather than at the master.
Is this possible or am I chasing a red herring?
-K
On Mon, May 8, 2017 at 6:31 AM, Ying Zhang
wrote: On 5 May 2017, at 23:07, Kelly Burkhart
wrote: Can someone describe how exactly replica tables are used to push joins to worker nodes?
Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.
Hai Kelly,
This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes.
But how exactly replica tables are used, and joins are handled largely depends on your query.
Best, Jennie
Is that the general strategy?
Thanks,
-K _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
The activity table on master is a merge table of remote tables activity_w0
and activity_w1 on the workers, which is essentially a view. The query
"works" in the sense that it returns the correct results. The issue is
that the query plan pulls the entire activity_wX tables back to the master
node and filters rows there rather than filtering at the worker nodes and
just doing a merge at the master.
-K
On Mon, May 8, 2017 at 5:59 PM, Brian Hood
Hi Kelly,
Maybe you should create a view on the worker and reference it from the master i am not sure if this works as i have never tried it as of yet.
Let me know of your findings.
Regards.
Brian Hood
On Mon, May 8, 2017 at 5:09 PM, Kelly Burkhart
wrote: Can someone please provide more concrete information on this? Is it known to work?
Say I have a master and two worker nodes.
I have a usr table on master and remote versions of this table on each worker:
sql>select * from usr; +------+-------+ | id | name | +======+=======+ | 1 | kelly | | 2 | fred | | 3 | bob | | 4 | igor | +------+-------+ 4 tuples (3.239ms)
I have an activity table that is sharded across the two workers, each remoted to master and a merge table (called activity) combining them. On worker 0:
sql>select * from activity_w0; +--------+------+ | usr_id | what | +========+======+ | 1 | act1 | | 3 | act3 | | 1 | act5 | | 3 | act8 | +--------+------+ 4 tuples (2.331ms)
On worker 1:
sql>select * from activity_w1; +--------+------+ | usr_id | what | +========+======+ | 2 | act2 | | 4 | act4 | | 2 | act6 | | 4 | act7 | +--------+------+ 4 tuples (3.899ms)
So worker 0 has only activity for odd usr ids, worker 1 has only activity for even worker ids. If I perform this query on master:
sql>select more> what more>from more> activity a more> join usr u on (a.usr_id = u.id) more>where more> u.name = 'kelly'; +------+ | what | +======+ | act1 | | act5 | +------+ 2 tuples (10.822ms)
I assume that master is pulling back the entire contents of both activity_w0 and activity_w1 because it doesn't know that there is a remote usr table on each worker. It is not clear to me how to set up a replica table on the workers to allow the filtering to be done there rather than at the master.
Is this possible or am I chasing a red herring?
-K
On Mon, May 8, 2017 at 6:31 AM, Ying Zhang
wrote: On 5 May 2017, at 23:07, Kelly Burkhart
wrote: Can someone describe how exactly replica tables are used to push joins to worker nodes?
Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.
Hai Kelly,
This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes.
But how exactly replica tables are used, and joins are handled largely depends on your query.
Best, Jennie
Is that the general strategy?
Thanks,
-K _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Kelly, maybe by now you're way past that, but for the record a short note about the use of MERGE TABLE and REPLICA TABLE. Let's say you have a master node 'master' and two slave nodes 'slave1' and 'slave2'. On slave1 you have created and filled tables foo1 and bar1. On slave2 you have created and filled tables foo2 and bar2. On 'master' you can now do something like this: CREATE REMOTE TABLE foo1 ( ... ) ON 'mapi:monetdb://slave1:50000/mydb'; CREATE REMOTE TABLE bar1 ( ... ) ON 'mapi:monetdb://slave1:50000/mydb'; CREATE REMOTE TABLE foo2 ( ... ) ON 'mapi:monetdb://slave2:50000/mydb'; CREATE REMOTE TABLE bar2 ( ... ) ON 'mapi:monetdb://slave2:50000/mydb'; -- Tables foo1 and foo2 contain parts of the data set. CREATE MERGE TABLE foo ( ... ); ALTER TABLE foo ADD TABLE foo1; ALTER TABLE foo ADD TABLE foo2; -- Tables bar1 and bar2 have identical content; the query planner can -- pick whichever is most convenient CREATE REPLICA TABLE bar ( ... ); ALTER TABLE bar ADD TABLE bar1; ALTER TABLE bar ADD TABLE bar2; Note that for MERGE TABLE performance it is a good idea to run something like the following on every partition: ALTER TABLE foo1 SET READ ONLY; ANALYZE foo1; This will allow the optimizer to skip those partitions that for example contain only irrelevant date ranges. Joeri On Tue, May 09, 2017 at 08:36:01AM -0500, Kelly Burkhart wrote:
The activity table on master is a merge table of remote tables activity_w0 and activity_w1 on the workers, which is essentially a view. The query "works" in the sense that it returns the correct results. The issue is that the query plan pulls the entire activity_wX tables back to the master node and filters rows there rather than filtering at the worker nodes and just doing a merge at the master.
-K
On Mon, May 8, 2017 at 5:59 PM, Brian Hood
wrote: Hi Kelly,
Maybe you should create a view on the worker and reference it from the master i am not sure if this works as i have never tried it as of yet.
Let me know of your findings.
Regards.
Brian Hood
On Mon, May 8, 2017 at 5:09 PM, Kelly Burkhart
wrote: Can someone please provide more concrete information on this? Is it known to work?
Say I have a master and two worker nodes.
I have a usr table on master and remote versions of this table on each worker:
sql>select * from usr; +------+-------+ | id | name | +======+=======+ | 1 | kelly | | 2 | fred | | 3 | bob | | 4 | igor | +------+-------+ 4 tuples (3.239ms)
I have an activity table that is sharded across the two workers, each remoted to master and a merge table (called activity) combining them. On worker 0:
sql>select * from activity_w0; +--------+------+ | usr_id | what | +========+======+ | 1 | act1 | | 3 | act3 | | 1 | act5 | | 3 | act8 | +--------+------+ 4 tuples (2.331ms)
On worker 1:
sql>select * from activity_w1; +--------+------+ | usr_id | what | +========+======+ | 2 | act2 | | 4 | act4 | | 2 | act6 | | 4 | act7 | +--------+------+ 4 tuples (3.899ms)
So worker 0 has only activity for odd usr ids, worker 1 has only activity for even worker ids. If I perform this query on master:
sql>select more> what more>from more> activity a more> join usr u on (a.usr_id = u.id) more>where more> u.name = 'kelly'; +------+ | what | +======+ | act1 | | act5 | +------+ 2 tuples (10.822ms)
I assume that master is pulling back the entire contents of both activity_w0 and activity_w1 because it doesn't know that there is a remote usr table on each worker. It is not clear to me how to set up a replica table on the workers to allow the filtering to be done there rather than at the master.
Is this possible or am I chasing a red herring?
-K
On Mon, May 8, 2017 at 6:31 AM, Ying Zhang
wrote: On 5 May 2017, at 23:07, Kelly Burkhart
wrote: Can someone describe how exactly replica tables are used to push joins to worker nodes?
Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.
Hai Kelly,
This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes.
But how exactly replica tables are used, and joins are handled largely depends on your query.
Best, Jennie
Is that the general strategy?
Thanks,
-K _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
As a side note, you can look at the output of PLAN SELECT bla FROM etc to see how the query is distributed over the slaves. If a subquery is executed on a slave, you will see something like REMOTE mapi:monetdb://localhost:50000/slave1 [ L1.i, people.name, location.loc ] appended at the final line of the subquery. For example, sql>PLAN SELECT * FROM people NATURAL JOIN location; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | union ( | | | table | | | | project ( | | | | | join ( | | | | | | project ( | | | | | | | REMOTE(repli.part1) [ part1.i as people.i, part1.name as people.name, part1.%TID% NOT NULL as people.%TID% ] COUNT | | | | | | ) [ people.i, people.name, people.%TID% NOT NULL ], | | | | | | REMOTE(repli.location1) [ location1.i as location.i, location1.loc as location.loc, location1.%TID% NOT NULL as location.%TID% ] | | | | | ) [ people.i = location.i ] | | | | ) [ people.i as L1.i, people.name, location.loc ] REMOTE mapi:monetdb://localhost:50000/slave1 [ L1.i, people.name, location.loc ], | | | table | | | | project ( | | | | | join ( | | | | | | project ( | | | | | | | REMOTE(repli.part2) [ part2.i as people.i, part2.name as people.name, part2.%TID% NOT NULL as people.%TID% ] COUNT | | | | | | ) [ people.i, people.name, people.%TID% NOT NULL ], | | | | | | REMOTE(repli.location2) [ location2.i as location.i, location2.loc as location.loc, location2.%TID% NOT NULL as location.%TID% ] | | | | | ) [ people.i = location.i ] | | | | ) [ people.i as L1.i, people.name, location.loc ] REMOTE mapi:monetdb://localhost:50000/slave2 [ L1.i, people.name, location.loc ] | | ) [ L1.i, people.name, location.loc ] | +------------------------------------------------------------------------------------------------------------------------------------------+ To summarize, +------------------------------------------------------------------------------+ | rel | +==============================================================================+ | union ( | | | table | | | | project (...) [...] REMOTE mapi:monetdb://localhost:50000/slave1 [...], | | | table | | | | project (...) [...] REMOTE mapi:monetdb://localhost:50000/slave2 [...] | | ) [ L1.i, people.name, location.loc ] | +------------------------------------------------------------------------------+ The joins are executed remotely and union'ed on the master. Joeri On Thu, May 11, 2017 at 04:27:14PM +0200, Joeri van Ruth wrote:
Hi Kelly,
maybe by now you're way past that, but for the record a short note about the use of MERGE TABLE and REPLICA TABLE.
Let's say you have a master node 'master' and two slave nodes 'slave1' and 'slave2'.
On slave1 you have created and filled tables foo1 and bar1.
On slave2 you have created and filled tables foo2 and bar2.
On 'master' you can now do something like this:
CREATE REMOTE TABLE foo1 ( ... ) ON 'mapi:monetdb://slave1:50000/mydb'; CREATE REMOTE TABLE bar1 ( ... ) ON 'mapi:monetdb://slave1:50000/mydb';
CREATE REMOTE TABLE foo2 ( ... ) ON 'mapi:monetdb://slave2:50000/mydb'; CREATE REMOTE TABLE bar2 ( ... ) ON 'mapi:monetdb://slave2:50000/mydb';
-- Tables foo1 and foo2 contain parts of the data set. CREATE MERGE TABLE foo ( ... ); ALTER TABLE foo ADD TABLE foo1; ALTER TABLE foo ADD TABLE foo2;
-- Tables bar1 and bar2 have identical content; the query planner can -- pick whichever is most convenient CREATE REPLICA TABLE bar ( ... ); ALTER TABLE bar ADD TABLE bar1; ALTER TABLE bar ADD TABLE bar2;
Note that for MERGE TABLE performance it is a good idea to run something like the following on every partition:
ALTER TABLE foo1 SET READ ONLY; ANALYZE foo1;
This will allow the optimizer to skip those partitions that for example contain only irrelevant date ranges.
Joeri
On Tue, May 09, 2017 at 08:36:01AM -0500, Kelly Burkhart wrote:
The activity table on master is a merge table of remote tables activity_w0 and activity_w1 on the workers, which is essentially a view. The query "works" in the sense that it returns the correct results. The issue is that the query plan pulls the entire activity_wX tables back to the master node and filters rows there rather than filtering at the worker nodes and just doing a merge at the master.
-K
On Mon, May 8, 2017 at 5:59 PM, Brian Hood
wrote: Hi Kelly,
Maybe you should create a view on the worker and reference it from the master i am not sure if this works as i have never tried it as of yet.
Let me know of your findings.
Regards.
Brian Hood
On Mon, May 8, 2017 at 5:09 PM, Kelly Burkhart
wrote: Can someone please provide more concrete information on this? Is it known to work?
Say I have a master and two worker nodes.
I have a usr table on master and remote versions of this table on each worker:
sql>select * from usr; +------+-------+ | id | name | +======+=======+ | 1 | kelly | | 2 | fred | | 3 | bob | | 4 | igor | +------+-------+ 4 tuples (3.239ms)
I have an activity table that is sharded across the two workers, each remoted to master and a merge table (called activity) combining them. On worker 0:
sql>select * from activity_w0; +--------+------+ | usr_id | what | +========+======+ | 1 | act1 | | 3 | act3 | | 1 | act5 | | 3 | act8 | +--------+------+ 4 tuples (2.331ms)
On worker 1:
sql>select * from activity_w1; +--------+------+ | usr_id | what | +========+======+ | 2 | act2 | | 4 | act4 | | 2 | act6 | | 4 | act7 | +--------+------+ 4 tuples (3.899ms)
So worker 0 has only activity for odd usr ids, worker 1 has only activity for even worker ids. If I perform this query on master:
sql>select more> what more>from more> activity a more> join usr u on (a.usr_id = u.id) more>where more> u.name = 'kelly'; +------+ | what | +======+ | act1 | | act5 | +------+ 2 tuples (10.822ms)
I assume that master is pulling back the entire contents of both activity_w0 and activity_w1 because it doesn't know that there is a remote usr table on each worker. It is not clear to me how to set up a replica table on the workers to allow the filtering to be done there rather than at the master.
Is this possible or am I chasing a red herring?
-K
On Mon, May 8, 2017 at 6:31 AM, Ying Zhang
wrote: On 5 May 2017, at 23:07, Kelly Burkhart
wrote: Can someone describe how exactly replica tables are used to push joins to worker nodes?
Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.
Hai Kelly,
This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes.
But how exactly replica tables are used, and joins are handled largely depends on your query.
Best, Jennie
Is that the general strategy?
Thanks,
-K _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Brian Hood
-
Joeri van Ruth
-
Kelly Burkhart
-
Ying Zhang