[MonetDB-users] Workflow for independent, read only, discretely updated tables.
Hi,
I'm evaluating MonetDB, and the following is based on reading the docs
and some of the monetdb-users email list archive.
I wonder if the following setup+workflow is correct, and if so, is it
monetdb best practice, given this use case:
We will have read only tables, each distributed across several
machines/servers, they will be updated daily.
There will be no cross table queries, i.e. only one table touched by each quert.
We would like to be able to update each table while not affecting the
availability of any other table.
The table_db's are not horizontally sharded, i.e. all the data for a
query will always come from that one table.
We cannot use udp multicast/broadcast so a monetdb cluster is not
possible (unless a localhost cluster is possible/sensible?).
The setup+workflow:
- one table per database (this allows for independent table updates),
let these be
On 21-11-2011 06:20:36 +1100, Hedge Hog wrote:
We would like to be able to update each table while not affecting the availability of any other table. The table_db's are not horizontally sharded, i.e. all the data for a query will always come from that one table. We cannot use udp multicast/broadcast so a monetdb cluster is not possible (unless a localhost cluster is possible/sensible?).
On a local machine, monetdbd "knows" all local databases, pattern matches might fail, if discovery broadcasts aren't seen by monetdbd itself.
The setup+workflow: - one table per database (this allows for independent table updates), let these be
. - update a master (writable) instance of the table on a 'special' monetdbd/machine. - copy the updated table to each machine. - To update the table on a machine: $> monetbd lock $> monetbd stop $> mclient -u monetdb updatefile
Does this work? monetdbd was designed to refuse to start
$> cat updatefile copy into MyTable from ('path_to_mytable_col_file_i', 'path_to_mytable_col_file_f', 'path_to_mytable_col_file_s'); $> monetbd release
$> monetbd start Is the above the best pattern/architecture of monetdb for such a use case?
Appreciate any insights people can offer
Do you, or don't you use a cluster in the end? If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.
On Mon, Nov 21, 2011 at 6:35 AM, Fabian Groffen
On 21-11-2011 06:20:36 +1100, Hedge Hog wrote:
We would like to be able to update each table while not affecting the availability of any other table. The table_db's are not horizontally sharded, i.e. all the data for a query will always come from that one table. We cannot use udp multicast/broadcast so a monetdb cluster is not possible (unless a localhost cluster is possible/sensible?).
On a local machine, monetdbd "knows" all local databases, pattern matches might fail, if discovery broadcasts aren't seen by monetdbd itself.
The setup+workflow: - one table per database (this allows for independent table updates), let these be
. - update a master (writable) instance of the table on a 'special' monetdbd/machine. - copy the updated table to each machine. - To update the table on a machine: $> monetbd lock $> monetbd stop $> mclient -u monetdb updatefile Does this work?
Likely not, fat-fingers, you are right the command should have been `monetdb`. I've just been 'doing my homework' and reviewing documentation before selecting a project to start exploring. These are my how-to notes I made as a read.
monetdbd was designed to refuse to start
here.
Thanks for clarifying this. I'll remove the `monetdb stop ...` command.
$> cat updatefile copy into MyTable from ('path_to_mytable_col_file_i', 'path_to_mytable_col_file_f', 'path_to_mytable_col_file_s'); $> monetbd release
$> monetbd start Is the above the best pattern/architecture of monetdb for such a use case?
Appreciate any insights people can offer
Do you, or don't you use a cluster in the end?
As indicated we don't have udp broadcast/multicast, so my understanding was this rules out remote clustering. However we will still be using multiple machines - just un-clustered. Sorry for being ambiguous.
If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.
Interesting. Is there a reason I cannot use a rsync copy (after locking the DB of course) of the dbfarm/dbname directories onto each (unclustered) machine Would such a os-copy-update be identical to a mclient-copy-update, or does the mclient-copy create some required metadata? Appreciate any clarifications. TIA
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Mon, Nov 21, 2011 at 7:25 AM, Mark V
On Mon, Nov 21, 2011 at 6:35 AM, Fabian Groffen
wrote: On 21-11-2011 06:20:36 +1100, Hedge Hog wrote:
We would like to be able to update each table while not affecting the availability of any other table. The table_db's are not horizontally sharded, i.e. all the data for a query will always come from that one table. We cannot use udp multicast/broadcast so a monetdb cluster is not possible (unless a localhost cluster is possible/sensible?).
On a local machine, monetdbd "knows" all local databases, pattern matches might fail, if discovery broadcasts aren't seen by monetdbd itself.
The setup+workflow: - one table per database (this allows for independent table updates), let these be
. - update a master (writable) instance of the table on a 'special' monetdbd/machine. - copy the updated table to each machine. - To update the table on a machine: $> monetbd lock $> monetbd stop $> mclient -u monetdb updatefile Does this work?
Likely not, fat-fingers, you are right the command should have been `monetdb`. I've just been 'doing my homework' and reviewing documentation before selecting a project to start exploring. These are my how-to notes I made as a read.
monetdbd was designed to refuse to start
here. Thanks for clarifying this. I'll remove the `monetdb stop ...` command.
$> cat updatefile copy into MyTable from ('path_to_mytable_col_file_i', 'path_to_mytable_col_file_f', 'path_to_mytable_col_file_s'); $> monetbd release
$> monetbd start Is the above the best pattern/architecture of monetdb for such a use case?
Appreciate any insights people can offer
Do you, or don't you use a cluster in the end?
As indicated we don't have udp broadcast/multicast, so my understanding was this rules out remote clustering. However we will still be using multiple machines - just un-clustered. Sorry for being ambiguous.
If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.
Interesting. Is there a reason I cannot use a rsync copy (after locking the DB of course) of the dbfarm/dbname directories onto each (unclustered) machine Would such a os-copy-update be identical to a mclient-copy-update, or does the mclient-copy create some required metadata?
Appreciate any clarifications.
TIA
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 21-11-2011 07:25:52 +1100, Mark V wrote:
Do you, or don't you use a cluster in the end?
As indicated we don't have udp broadcast/multicast, so my understanding was this rules out remote clustering. However we will still be using multiple machines - just un-clustered. Sorry for being ambiguous.
Ok, cluster to me means: multiple machines (in a network).
If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.
Interesting. Is there a reason I cannot use a rsync copy (after locking the DB of course) of the dbfarm/dbname directories onto each (unclustered) machine Would such a os-copy-update be identical to a mclient-copy-update, or does the mclient-copy create some required metadata?
Given identical hardware, you can just copy the dbfarm/dbname directories around. The only thing you need to keep in mind is that the files may not get altered, hence you probably need a lock + stop before you copy to ensure a consistent image being used.
participants (3)
-
Fabian Groffen
-
Hedge Hog
-
Mark V