Monetdb clusters VS Distributed Query Processing
Hi, I am trying to understand the main difference between clusters and the remote table introduced in JUL2015 . Correct me if I am wrong, in clusters the complete table is copied over all nodes, and monetdb will do the query parallel execution on the nodes and result concat. In the remote table I have more control on executing the query on a part of a table on a selected node depending on resources availables. If this is true, are the following point valid: 1. its better to have identical nodes for clusters 2. the execution time is inversely proportional to the number of nodes in the clusters 3. the executing time is dictated by the slowest node in cluster 4. fail-over is by default supported in cluster configuration One last question about the supported queries in these 2 modes(groupby, order, limit). if any, a reference to a documentation about the clustering is appreciated. Thank you.
i think it's something like
shard - something like raid 0 (each shard have part of the full table, read
is faster cause each shard have a well know part of table, for example
value from column1 > 10, you can use mpp too, but instead of just mpp you
have something like "table partitioning" to speed some queries, and slow
full table scan in some cases )
federate - like symbolic links (it's a table in another server, but you
define the table in current server to read from that server, this allow
apps to continue using database/table without code changes)
cluster - something like raid 1 / drbd (each server have part of the full
table (cluster+shard), or have the full table (only cluster), write is
slower cause all servers must return commit ok (if it's a sync server,
async server don't need but don't have consistent reads, semi-sync cluster
have the same problem with more controlled cenario) to ensure a good write,
but reads can be faster cause you can use mpp queries / multi thread reads
in more than one host/node), you have a fail safe feature here too cause if
one server die, the other have all data
2015-09-22 7:08 GMT-03:00 imad hajj chahine
Hi,
I am trying to understand the main difference between clusters and the remote table introduced in JUL2015 .
Correct me if I am wrong, in clusters the complete table is copied over all nodes, and monetdb will do the query parallel execution on the nodes and result concat. In the remote table I have more control on executing the query on a part of a table on a selected node depending on resources availables.
If this is true, are the following point valid:
1. its better to have identical nodes for clusters 2. the execution time is inversely proportional to the number of nodes in the clusters 3. the executing time is dictated by the slowest node in cluster 4. fail-over is by default supported in cluster configuration
One last question about the supported queries in these 2 modes(groupby, order, limit).
if any, a reference to a documentation about the clustering is appreciated.
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Roberto Spadim SPAEmpresarial - Software ERP Eng. Automação e Controle
participants (2)
-
imad hajj chahine
-
Roberto Spadim