[MonetDB-users] Workflow for independent, read only, discretely updated tables.

newer
[MonetDB-users] Announcement: New...

Hedge Hog

20 Nov 2011 20 Nov '11

7:20 p.m.

Hi, I'm evaluating MonetDB, and the following is based on reading the docs and some of the monetdb-users email list archive. I wonder if the following setup+workflow is correct, and if so, is it monetdb best practice, given this use case: We will have read only tables, each distributed across several machines/servers, they will be updated daily. There will be no cross table queries, i.e. only one table touched by each quert. We would like to be able to update each table while not affecting the availability of any other table. The table_db's are not horizontally sharded, i.e. all the data for a query will always come from that one table. We cannot use udp multicast/broadcast so a monetdb cluster is not possible (unless a localhost cluster is possible/sensible?). The setup+workflow: - one table per database (this allows for independent table updates), let these be . - update a master (writable) instance of the table on a 'special' monetdbd/machine. - copy the updated table to each machine. - To update the table on a machine: $> monetbd lock $> monetbd stop $> mclient -u monetdb updatefile $> cat updatefile copy into MyTable from ('path_to_mytable_col_file_i', 'path_to_mytable_col_file_f', 'path_to_mytable_col_file_s'); $> monetbd release $> monetbd start Is the above the best pattern/architecture of monetdb for such a use case? Appreciate any insights people can offer TIA Hedge -- πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα [The fox knows many things, but the hedgehog knows one big thing.] Archilochus, Greek poet (c. 680 BC – c. 645 BC) http://hedgehogshiatus.com

Show replies by date

Fabian Groffen

20 Nov 20 Nov

7:35 p.m.

New subject: [MonetDB-users] Workflow for independent, read only, discretely updated tables.

On 21-11-2011 06:20:36 +1100, Hedge Hog wrote:

...

On a local machine, monetdbd "knows" all local databases, pattern matches might fail, if discovery broadcasts aren't seen by monetdbd itself.

...

Does this work? monetdbd was designed to refuse to start here.

...

Do you, or don't you use a cluster in the end? If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.

Mark V

8:25 p.m.

New subject: [MonetDB-users] Workflow for independent, read only, discretely updated tables.

On Mon, Nov 21, 2011 at 6:35 AM, Fabian Groffen wrote:

...

On 21-11-2011 06:20:36 +1100, Hedge Hog wrote:

...
We would like to be able to update each table while not affecting the availability of any other table. The table_db's are not horizontally sharded, i.e. all the data for a query will always come from that one table. We cannot use udp multicast/broadcast so a monetdb cluster is not possible (unless a localhost cluster is possible/sensible?).

On a local machine, monetdbd "knows" all local databases, pattern matches might fail, if discovery broadcasts aren't seen by monetdbd itself.

...
The setup+workflow: - one table per database (this allows for independent table updates), let these be . - update a master (writable) instance of the table on a 'special' monetdbd/machine. - copy the updated table to each machine. - To update the table on a machine: $> monetbd lock $> monetbd stop $> mclient -u monetdb updatefile

Does this work?

Likely not, fat-fingers, you are right the command should have been `monetdb`. I've just been 'doing my homework' and reviewing documentation before selecting a project to start exploring. These are my how-to notes I made as a read.

...

monetdbd was designed to refuse to start here.

Thanks for clarifying this. I'll remove the `monetdb stop ...` command.

...

...
$> cat updatefile copy into MyTable from ('path_to_mytable_col_file_i', 'path_to_mytable_col_file_f', 'path_to_mytable_col_file_s'); $> monetbd release $> monetbd start

Is the above the best pattern/architecture of monetdb for such a use case?

Appreciate any insights people can offer

Do you, or don't you use a cluster in the end?

As indicated we don't have udp broadcast/multicast, so my understanding was this rules out remote clustering. However we will still be using multiple machines - just un-clustered. Sorry for being ambiguous.

...

If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.

Interesting. Is there a reason I cannot use a rsync copy (after locking the DB of course) of the dbfarm/dbname directories onto each (unclustered) machine Would such a os-copy-update be identical to a mclient-copy-update, or does the mclient-copy create some required metadata? Appreciate any clarifications. TIA

...

------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Mark V

8:31 p.m.

New subject: [MonetDB-users] Workflow for independent, read only, discretely updated tables.

On Mon, Nov 21, 2011 at 7:25 AM, Mark V wrote: Pretend you didn't see that... one of my clients asks I keep all my online activities anonymous ;)

...

On Mon, Nov 21, 2011 at 6:35 AM, Fabian Groffen wrote:

...
On 21-11-2011 06:20:36 +1100, Hedge Hog wrote:

...
We would like to be able to update each table while not affecting the availability of any other table. The table_db's are not horizontally sharded, i.e. all the data for a query will always come from that one table. We cannot use udp multicast/broadcast so a monetdb cluster is not possible (unless a localhost cluster is possible/sensible?).

On a local machine, monetdbd "knows" all local databases, pattern matches might fail, if discovery broadcasts aren't seen by monetdbd itself.

...
The setup+workflow: - one table per database (this allows for independent table updates), let these be . - update a master (writable) instance of the table on a 'special' monetdbd/machine. - copy the updated table to each machine. - To update the table on a machine: $> monetbd lock $> monetbd stop $> mclient -u monetdb updatefile

Does this work?

Likely not, fat-fingers, you are right the command should have been `monetdb`. I've just been 'doing my homework' and reviewing documentation before selecting a project to start exploring. These are my how-to notes I made as a read.

...
monetdbd was designed to refuse to start here.

Thanks for clarifying this. I'll remove the `monetdb stop ...` command.

...
...
$> cat updatefile copy into MyTable from ('path_to_mytable_col_file_i', 'path_to_mytable_col_file_f', 'path_to_mytable_col_file_s'); $> monetbd release $> monetbd start

Is the above the best pattern/architecture of monetdb for such a use case?

Appreciate any insights people can offer

Do you, or don't you use a cluster in the end?

As indicated we don't have udp broadcast/multicast, so my understanding was this rules out remote clustering. However we will still be using multiple machines - just un-clustered. Sorry for being ambiguous.

...
If so, you probably can use rsync as well to sync the dbfarm/dbname directories of the ones you load new data into.

Interesting. Is there a reason I cannot use a rsync copy (after locking the DB of course) of the dbfarm/dbname directories onto each (unclustered) machine Would such a os-copy-update be identical to a mclient-copy-update, or does the mclient-copy create some required metadata?

Appreciate any clarifications.

TIA

...
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Fabian Groffen

8:38 p.m.

New subject: [MonetDB-users] Workflow for independent, read only, discretely updated tables.

On 21-11-2011 07:25:52 +1100, Mark V wrote:

...

Ok, cluster to me means: multiple machines (in a network).

...

Given identical hardware, you can just copy the dbfarm/dbname directories around. The only thing you need to keep in mind is that the files may not get altered, hence you probably need a lock + stop before you copy to ensure a consistent image being used.

4925

Age (days ago)

4925

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Fabian Groffen
Hedge Hog
Mark V