[MonetDB-users] COPY INTO on Ubuntu 64bit is slow
Hi guys.... I've got to machines running Monet. 1. Quad core Intel with 4GB RAM, Ubuntu 8.04 32BIT (server ed) 2. AMD64 Quad Core, 4GB RAM, KUbuntu 8.04 64BIT (kde 3.5.9) On system 1 I can do a "copy into" of 6million records in no time at all :) On system 2 the same function, with the exact same data takes forever. Any thoughts on the reason? Is this a memory related issue? Cronje
On Tue, Aug 05, 2008 at 05:25:09PM +0200, Cronje Fourie wrote:
Hi guys....
I've got to machines running Monet. 1. Quad core Intel with 4GB RAM, Ubuntu 8.04 32BIT (server ed) 2. AMD64 Quad Core, 4GB RAM, KUbuntu 8.04 64BIT (kde 3.5.9)
On system 1 I can do a "copy into" of 6million records in no time at all :)
On system 2 the same function, with the exact same data takes forever.
Any thoughts on the reason? Is this a memory related issue?
I assume that you configure MonetDB on both machines with the same (if any) optimization and/or debugging options and compiled it with the same (versions of) tools (in particular compiler). Morover, I assume that you built a 64-bit version with 64-bit OIDs on the 64-bit machine (should be the default on a 64-bit machines). Finally, I assume that both machines have comparable I/O systems (disks). Did/can you check how big your mserver5 process is growing on both machines during the copy into, and/or whether either machines (obviously the second one) is swapping heavily during the copy into? How wide are your records, i.e., what does your 'create table' statement look like? Do you load from a plain (uncompressed) file or from a compressed (how) file? Stefan
Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Tue, Aug 05, 2008 at 05:25:09PM +0200, Cronje Fourie wrote:
Hi guys....
I've got to machines running Monet. 1. Quad core Intel with 4GB RAM, Ubuntu 8.04 32BIT (server ed) 2. AMD64 Quad Core, 4GB RAM, KUbuntu 8.04 64BIT (kde 3.5.9) Very good that you list you hardware details. Could you also include
On Tue, Aug 05, 2008 at 06:15:38PM +0200, Stefan Manegold wrote: the version of MonetDB you are currently using (on both) systems? Niels
On system 1 I can do a "copy into" of 6million records in no time at all :)
On system 2 the same function, with the exact same data takes forever.
Any thoughts on the reason? Is this a memory related issue?
I assume that you configure MonetDB on both machines with the same (if any) optimization and/or debugging options and compiled it with the same (versions of) tools (in particular compiler).
Morover, I assume that you built a 64-bit version with 64-bit OIDs on the 64-bit machine (should be the default on a 64-bit machines).
Finally, I assume that both machines have comparable I/O systems (disks).
Did/can you check how big your mserver5 process is growing on both machines during the copy into, and/or whether either machines (obviously the second one) is swapping heavily during the copy into?
How wide are your records, i.e., what does your 'create table' statement look like?
Do you load from a plain (uncompressed) file or from a compressed (how) file?
Stefan
Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Hi Stefan... All your assumptions are correct. - Both systems use a single SATA disk, SWAP and data partition on same disks - Compiled the MonetDB, from Jun-SuperTar, with options: --enable-sql --enable-optimise --nightly-stable - gcc etc are all latest on both machines, gcc 4.2.3 - compiled with 64bit OID's on 64bit machine. - I'm loading plain text data - Including the create table statement at the bottom, it's a whopper :) a couple of date fields are loaded as VARCHAR because the source data format is not compatible with MonetDB. On the load side the VM usage went up to 11GB...on the 64bit system. I can't remember the exact value on the 32bit system but was around 3GB for VM. One major difference between the two systems is that the 32bit system is a SERVER install, vs full KDE on the 64bit. I'm now trying to run the same load on a 32bit system, which also has KDE installed and it is also much slower than on the server. Incidentally VM usage on this machine is 1GB I'm downloading the ubuntu server image at the moment and will install and test on that shortly, then get back to you with the results. CREATE TABLE PrimalPool ( id10 char(10) NOT NULL, idno varchar(13) NOT NULL, cell_phone varchar(20) NULL, combined_expenses_monthly double NULL, combined_income_monthly double NULL, customer_age_id integer NULL, deceased tinyint NULL, demographic char(10) NULL, do_not_mail tinyint NULL, dob varchar(50) NULL, email varchar(150) NULL, emperica_score integer NULL, empscore_date varchar(50) NULL, ethnicity_id char(1) NULL, forename_1 varchar(20) NULL, forename_2 varchar(20) NULL, forename_3 varchar(20) NULL, gender_id char(1) NULL, home_language char(15) NULL, home_phone varchar(20) NULL, initials varchar(3) NULL, language char(1) NULL, marital_status_id char(1) NULL, market_sector_id tinyint NULL, prospect_age tinyint NULL, surname varchar(50) NULL, title_id varchar(20) NULL, work_phone varchar(20) NULL, bond_rent_payments integer NULL, bureau_installment_monthly double NULL, expense_bond_monthly double NULL, expense_maintenance_monthly double NULL, expense_municipal_services_monthly double NULL, income_tax_monthly double NULL, maint_payments integer NULL, municipal_payments integer NULL, net_income_monthly double NULL, pension_contrib_monthly double NULL, salary_gross_monthly double NULL, salary_other_monthly double NULL, statutory_expenses_monthly double NULL, uif_contrib_monthly double NULL, sbcscore integer NULL, sbcscore_date varchar(50) NULL, source_id integer NULL, account_charge_off tinyint NULL, avg_account_age integer NULL, avg_amount_offered integer NULL, avg_bscore integer NULL, avg_bscore_range integer NULL, avg_credit_limit integer NULL, avg_hilife_balance integer NULL, avg_income integer NULL, has_bad_exclusion_score tinyint NULL, has_negative_block_code tinyint NULL, has_payment_projection_score tinyint NULL, has_staff_loan tinyint NULL, invalid_logo_1stloan tinyint NULL, mail_date varchar(50) NULL, mailing_indicator tinyint NULL, mailing_key tinyint NULL, max_account_age integer NULL, max_agreement_date varchar(50) NULL, max_amount_offered integer NULL, max_bscore integer NULL, max_bscore_date varchar(50) NULL, max_bscore_range integer NULL, max_credit_limit integer NULL, max_hilife_balance integer NULL, max_income integer NULL, max_open_date varchar(50) NULL, min_account_age integer NULL, min_agreement_date varchar(50) NULL, min_amount_offered integer NULL, min_bscore integer NULL, min_bscore_date varchar(50) NULL, min_bscore_range integer NULL, min_credit_limit integer NULL, min_hilife_balance integer NULL, min_income integer NULL, min_open_date varchar(50) NULL, no_accounts tinyint NULL, no_active_accounts tinyint NULL, no_inactive_accounts tinyint NULL, rcs_offer_campaign_no varchar(20) NULL, rcs_offer_expiry_date varchar(50) NULL, rcs_offer_interest_rate tinyint NULL, sum_contract_delinquency integer NULL, sum_payment_recency tinyint NULL, sum_return_mail integer NULL, sum_warning_code integer NULL, account_active char(1) NULL, account_age integer NULL, account_id bigint NULL, account_status char(3) NULL, agreement_date varchar(50) NULL, amount_offered double NULL, application_id varchar(20) NULL, block_code_1 varchar(20) NULL, block_code_2 varchar(20) NULL, bscore integer NULL, bscore_date varchar(50) NULL, bscore_range varchar(20) NULL, campaign_no varchar(20) NULL, contract_date varchar(50) NULL, contract_delinquency tinyint NULL, credit_limit double NULL, crlife_admin double NULL, current_balance double NULL, customer_id char(30) NULL, entity_id integer NULL, expiry_date varchar(50) NULL, group_id integer NULL, hilife_balance double NULL, income double NULL, installment_value double NULL, interest_rate double NULL, logo_id varchar(3) NULL, open_date varchar(50) NULL, palink_type varchar(20) NULL, payment_recency integer NULL, payout_amount double NULL, payout_date varchar(50) NULL, physical_address_id char(32) NULL, principal_value double NULL, prodcatid integer NULL, res_address_id char(32) NULL, return_mail integer NULL, risk_rating varchar(20) NULL, tag_date varchar(50) NULL, term_period tinyint NULL, warning_code tinyint NULL, warning_code1 integer NULL, warning_code2 integer NULL, warning_code3 integer NULL, write_off_balance double NULL, accno varchar(20) NULL, account_no varchar(20) NULL, address_class varchar(1) NULL, address_code varchar(3) NULL, address_id_ia varchar(30) NULL, address_line1 varchar(65) NULL, address_line2 varchar(65) NULL, address_line3 varchar(65) NULL, address_line4 varchar(65) NULL, address_line5 varchar(65) NULL, address_source_date varchar(50) NULL, address_type varchar(3) NULL, address_type_id char(1) NULL, amc varchar(15) NULL, amc_id varchar(10) NULL, building_type varchar(3) NULL, change_status varchar(3) NULL, clean_address_line1 varchar(65) NULL, clean_address_line2 varchar(65) NULL, clean_address_line3 varchar(65) NULL, clean_address_line4 varchar(65) NULL, clean_address_line5 varchar(65) NULL, country varchar(50) NULL, country_code varchar(6) NULL, delivery_code varchar(1) NULL, dpid varchar(20) NULL, lines_in tinyint NULL, lmc varchar(20) NULL, lmc_id varchar(15) NULL, location_type varchar(20) NULL, mpcode varchar(5) NULL, mpname varchar(36) NULL, postcode varchar(10) NULL, region varchar(40) NULL, sec_location_type varchar(20) NULL, spcode varchar(8) NULL, spname varchar(25) NULL, structure_status varchar(3) NULL, suburb varchar(65) NULL, suburb_ethnic_code varchar(2) NULL, suburb_income tinyint NULL, suburb_income_band varchar(2) NULL, suburb_lat double NULL, suburb_long double NULL, tag varchar(40) NULL, town varchar(65) NULL, umc varchar(15) NULL, validity_status varchar(3) NULL, prospect_right bigint null DEFAULT 0, has_email tinyint NULL DEFAULT 0, has_address tinyint NULL DEFAULT 0, has_phone tinyint NULL DEFAULT 0, has_home_phone tinyint NULL DEFAULT 0, has_cell_phone tinyint NULL DEFAULT 0, has_work_phone tinyint NULL DEFAULT 0, chargeoff_date varchar(50) NULL, mloans_index char(30) NULL, filler_space char(1) NULL DEFAULT ' ', filler_empty varchar(1) NULL DEFAULT '', curr_date varchar(50) NULL DEFAULT '', prospect_owner bigint NULL DEFAULT 0, reference_no varchar(20) NULL, c char(30) NULL DEFAULT '0' ) GO Thanks for the help Cronje Stefan Manegold wrote:
On Tue, Aug 05, 2008 at 05:25:09PM +0200, Cronje Fourie wrote:
Hi guys....
I've got to machines running Monet. 1. Quad core Intel with 4GB RAM, Ubuntu 8.04 32BIT (server ed) 2. AMD64 Quad Core, 4GB RAM, KUbuntu 8.04 64BIT (kde 3.5.9)
On system 1 I can do a "copy into" of 6million records in no time at all :)
On system 2 the same function, with the exact same data takes forever.
Any thoughts on the reason? Is this a memory related issue?
I assume that you configure MonetDB on both machines with the same (if any) optimization and/or debugging options and compiled it with the same (versions of) tools (in particular compiler).
Morover, I assume that you built a 64-bit version with 64-bit OIDs on the 64-bit machine (should be the default on a 64-bit machines).
Finally, I assume that both machines have comparable I/O systems (disks).
Did/can you check how big your mserver5 process is growing on both machines during the copy into, and/or whether either machines (obviously the second one) is swapping heavily during the copy into?
How wide are your records, i.e., what does your 'create table' statement look like?
Do you load from a plain (uncompressed) file or from a compressed (how) file?
Stefan
Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://www.nabble.com/COPY-INTO-on-Ubuntu-64bit-is-slow-tp18833879p18848282.... Sent from the monetdb-users mailing list archive at Nabble.com.
On Wed, Aug 06, 2008 at 03:16:28AM -0700, Cronje Fourie wrote:
Hi Stefan...
All your assumptions are correct. - Both systems use a single SATA disk, SWAP and data partition on same disks - Compiled the MonetDB, from Jun-SuperTar, with options: --enable-sql --enable-optimise --nightly-stable - gcc etc are all latest on both machines, gcc 4.2.3 - compiled with 64bit OID's on 64bit machine. - I'm loading plain text data - Including the create table statement at the bottom, it's a whopper :) a couple of date fields are loaded as VARCHAR because the source data format is not compatible with MonetDB.
~ 200 attributes --- "impressive" ;-)
On the load side the VM usage went up to 11GB...on the 64bit system. I can't remember the exact value on the 32bit system but was around 3GB for VM.
Total VM usage of the system, or only of the mserver5 process? Assuming that all data type you use occupy the same storage with both 32- & 64-bit systems (Niels might comment on this one), the (only?) difference between 32- & 64-bit systems is the size of OIDs and hash values used by MonetDB internally. By default, they are 32-bit on 32-bit systems and 64-bit on 64-bit systems --- the latter obviously occupying more (double the) space than the former. At configure time, you can decide to build a 64-bit MonetDB with 32-bit OIDs (and hence hash values) (which --- in short --- will restrict tables to at most 2 billion records as on 32-bit systems --- but that should not be a problem given your 6 million records, right?) by calling configure with option "--enable-oid32". When using the monetdb-install.sh script, you have to resort to the "developers options/features" (cf., `monetdb-install.sh --devhelp`), e.g., as follows EXTRA_ECONF=--enable-oid32 monetdb-install.sh ... This might (should) reduce the memory footprint of mserver5 considerably.
One major difference between the two systems is that the 32bit system is a SERVER install, vs full KDE on the 64bit.
I'm now trying to run the same load on a 32bit system, which also has KDE installed and it is also much slower than on the server. Incidentally VM usage on this machine is 1GB
Hm, in case "server" vs. "KDE" make the difference, it's either that KDE consumes very (too?) much memory, leaving only little for other programs (as, e.g., MonetDB), or there's "something strange" going on that "blocks" MonetDB --- I would have no idea what that could be ...
I'm downloading the ubuntu server image at the moment and will install and test on that shortly, then get back to you with the results.
Yes, that would be great. I am / we are curious the hear what you find out! Stefan
CREATE TABLE PrimalPool ( id10 char(10) NOT NULL, idno varchar(13) NOT NULL, cell_phone varchar(20) NULL, combined_expenses_monthly double NULL, combined_income_monthly double NULL, customer_age_id integer NULL, deceased tinyint NULL, demographic char(10) NULL, do_not_mail tinyint NULL, dob varchar(50) NULL, email varchar(150) NULL, emperica_score integer NULL, empscore_date varchar(50) NULL, ethnicity_id char(1) NULL, forename_1 varchar(20) NULL, forename_2 varchar(20) NULL, forename_3 varchar(20) NULL, gender_id char(1) NULL, home_language char(15) NULL, home_phone varchar(20) NULL, initials varchar(3) NULL, language char(1) NULL, marital_status_id char(1) NULL, market_sector_id tinyint NULL, prospect_age tinyint NULL, surname varchar(50) NULL, title_id varchar(20) NULL, work_phone varchar(20) NULL, bond_rent_payments integer NULL, bureau_installment_monthly double NULL, expense_bond_monthly double NULL, expense_maintenance_monthly double NULL, expense_municipal_services_monthly double NULL, income_tax_monthly double NULL, maint_payments integer NULL, municipal_payments integer NULL, net_income_monthly double NULL, pension_contrib_monthly double NULL, salary_gross_monthly double NULL, salary_other_monthly double NULL, statutory_expenses_monthly double NULL, uif_contrib_monthly double NULL, sbcscore integer NULL, sbcscore_date varchar(50) NULL, source_id integer NULL, account_charge_off tinyint NULL, avg_account_age integer NULL, avg_amount_offered integer NULL, avg_bscore integer NULL, avg_bscore_range integer NULL, avg_credit_limit integer NULL, avg_hilife_balance integer NULL, avg_income integer NULL, has_bad_exclusion_score tinyint NULL, has_negative_block_code tinyint NULL, has_payment_projection_score tinyint NULL, has_staff_loan tinyint NULL, invalid_logo_1stloan tinyint NULL, mail_date varchar(50) NULL, mailing_indicator tinyint NULL, mailing_key tinyint NULL, max_account_age integer NULL, max_agreement_date varchar(50) NULL, max_amount_offered integer NULL, max_bscore integer NULL, max_bscore_date varchar(50) NULL, max_bscore_range integer NULL, max_credit_limit integer NULL, max_hilife_balance integer NULL, max_income integer NULL, max_open_date varchar(50) NULL, min_account_age integer NULL, min_agreement_date varchar(50) NULL, min_amount_offered integer NULL, min_bscore integer NULL, min_bscore_date varchar(50) NULL, min_bscore_range integer NULL, min_credit_limit integer NULL, min_hilife_balance integer NULL, min_income integer NULL, min_open_date varchar(50) NULL, no_accounts tinyint NULL, no_active_accounts tinyint NULL, no_inactive_accounts tinyint NULL, rcs_offer_campaign_no varchar(20) NULL, rcs_offer_expiry_date varchar(50) NULL, rcs_offer_interest_rate tinyint NULL, sum_contract_delinquency integer NULL, sum_payment_recency tinyint NULL, sum_return_mail integer NULL, sum_warning_code integer NULL, account_active char(1) NULL, account_age integer NULL, account_id bigint NULL, account_status char(3) NULL, agreement_date varchar(50) NULL, amount_offered double NULL, application_id varchar(20) NULL, block_code_1 varchar(20) NULL, block_code_2 varchar(20) NULL, bscore integer NULL, bscore_date varchar(50) NULL, bscore_range varchar(20) NULL, campaign_no varchar(20) NULL, contract_date varchar(50) NULL, contract_delinquency tinyint NULL, credit_limit double NULL, crlife_admin double NULL, current_balance double NULL, customer_id char(30) NULL, entity_id integer NULL, expiry_date varchar(50) NULL, group_id integer NULL, hilife_balance double NULL, income double NULL, installment_value double NULL, interest_rate double NULL, logo_id varchar(3) NULL, open_date varchar(50) NULL, palink_type varchar(20) NULL, payment_recency integer NULL, payout_amount double NULL, payout_date varchar(50) NULL, physical_address_id char(32) NULL, principal_value double NULL, prodcatid integer NULL, res_address_id char(32) NULL, return_mail integer NULL, risk_rating varchar(20) NULL, tag_date varchar(50) NULL, term_period tinyint NULL, warning_code tinyint NULL, warning_code1 integer NULL, warning_code2 integer NULL, warning_code3 integer NULL, write_off_balance double NULL, accno varchar(20) NULL, account_no varchar(20) NULL, address_class varchar(1) NULL, address_code varchar(3) NULL, address_id_ia varchar(30) NULL, address_line1 varchar(65) NULL, address_line2 varchar(65) NULL, address_line3 varchar(65) NULL, address_line4 varchar(65) NULL, address_line5 varchar(65) NULL, address_source_date varchar(50) NULL, address_type varchar(3) NULL, address_type_id char(1) NULL, amc varchar(15) NULL, amc_id varchar(10) NULL, building_type varchar(3) NULL, change_status varchar(3) NULL, clean_address_line1 varchar(65) NULL, clean_address_line2 varchar(65) NULL, clean_address_line3 varchar(65) NULL, clean_address_line4 varchar(65) NULL, clean_address_line5 varchar(65) NULL, country varchar(50) NULL, country_code varchar(6) NULL, delivery_code varchar(1) NULL, dpid varchar(20) NULL, lines_in tinyint NULL, lmc varchar(20) NULL, lmc_id varchar(15) NULL, location_type varchar(20) NULL, mpcode varchar(5) NULL, mpname varchar(36) NULL, postcode varchar(10) NULL, region varchar(40) NULL, sec_location_type varchar(20) NULL, spcode varchar(8) NULL, spname varchar(25) NULL, structure_status varchar(3) NULL, suburb varchar(65) NULL, suburb_ethnic_code varchar(2) NULL, suburb_income tinyint NULL, suburb_income_band varchar(2) NULL, suburb_lat double NULL, suburb_long double NULL, tag varchar(40) NULL, town varchar(65) NULL, umc varchar(15) NULL, validity_status varchar(3) NULL, prospect_right bigint null DEFAULT 0, has_email tinyint NULL DEFAULT 0, has_address tinyint NULL DEFAULT 0, has_phone tinyint NULL DEFAULT 0, has_home_phone tinyint NULL DEFAULT 0, has_cell_phone tinyint NULL DEFAULT 0, has_work_phone tinyint NULL DEFAULT 0, chargeoff_date varchar(50) NULL, mloans_index char(30) NULL, filler_space char(1) NULL DEFAULT ' ', filler_empty varchar(1) NULL DEFAULT '', curr_date varchar(50) NULL DEFAULT '', prospect_owner bigint NULL DEFAULT 0, reference_no varchar(20) NULL, c char(30) NULL DEFAULT '0' ) GO
Thanks for the help Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Wed, Aug 06, 2008 at 03:16:34PM +0200, Stefan Manegold wrote:
On Wed, Aug 06, 2008 at 03:16:28AM -0700, Cronje Fourie wrote:
Hi Stefan...
All your assumptions are correct. - Both systems use a single SATA disk, SWAP and data partition on same disks - Compiled the MonetDB, from Jun-SuperTar, with options: --enable-sql --enable-optimise --nightly-stable - gcc etc are all latest on both machines, gcc 4.2.3 - compiled with 64bit OID's on 64bit machine. - I'm loading plain text data - Including the create table statement at the bottom, it's a whopper :) a couple of date fields are loaded as VARCHAR because the source data format is not compatible with MonetDB.
~ 200 attributes --- "impressive" ;-)
On the load side the VM usage went up to 11GB...on the 64bit system. I can't remember the exact value on the 32bit system but was around 3GB for VM.
Total VM usage of the system, or only of the mserver5 process?
Assuming that all data type you use occupy the same storage with both 32- & 64-bit systems (Niels might comment on this one), the (only?) difference between 32- & 64-bit systems is the size of OIDs and hash values used by MonetDB internally. By default, they are 32-bit on 32-bit systems and 64-bit on 64-bit systems --- the latter obviously occupying more (double the) space than the former. Given that the table doesn't have any 'key constraints', the loading should use the exact same amount of memory. Only during processing (queries or key constraint checking) the hash and oid values play a role. Indeed the other types should on 32 and 64 bit systems be of equal size.
Niels
At configure time, you can decide to build a 64-bit MonetDB with 32-bit OIDs (and hence hash values) (which --- in short --- will restrict tables to at most 2 billion records as on 32-bit systems --- but that should not be a problem given your 6 million records, right?) by calling configure with option "--enable-oid32". When using the monetdb-install.sh script, you have to resort to the "developers options/features" (cf., `monetdb-install.sh --devhelp`), e.g., as follows
EXTRA_ECONF=--enable-oid32 monetdb-install.sh ...
This might (should) reduce the memory footprint of mserver5 considerably.
One major difference between the two systems is that the 32bit system is a SERVER install, vs full KDE on the 64bit.
I'm now trying to run the same load on a 32bit system, which also has KDE installed and it is also much slower than on the server. Incidentally VM usage on this machine is 1GB
Hm, in case "server" vs. "KDE" make the difference, it's either that KDE consumes very (too?) much memory, leaving only little for other programs (as, e.g., MonetDB), or there's "something strange" going on that "blocks" MonetDB --- I would have no idea what that could be ...
I'm downloading the ubuntu server image at the moment and will install and test on that shortly, then get back to you with the results.
Yes, that would be great. I am / we are curious the hear what you find out!
Stefan
CREATE TABLE PrimalPool ( id10 char(10) NOT NULL, idno varchar(13) NOT NULL, cell_phone varchar(20) NULL, combined_expenses_monthly double NULL, combined_income_monthly double NULL, customer_age_id integer NULL, deceased tinyint NULL, demographic char(10) NULL, do_not_mail tinyint NULL, dob varchar(50) NULL, email varchar(150) NULL, emperica_score integer NULL, empscore_date varchar(50) NULL, ethnicity_id char(1) NULL, forename_1 varchar(20) NULL, forename_2 varchar(20) NULL, forename_3 varchar(20) NULL, gender_id char(1) NULL, home_language char(15) NULL, home_phone varchar(20) NULL, initials varchar(3) NULL, language char(1) NULL, marital_status_id char(1) NULL, market_sector_id tinyint NULL, prospect_age tinyint NULL, surname varchar(50) NULL, title_id varchar(20) NULL, work_phone varchar(20) NULL, bond_rent_payments integer NULL, bureau_installment_monthly double NULL, expense_bond_monthly double NULL, expense_maintenance_monthly double NULL, expense_municipal_services_monthly double NULL, income_tax_monthly double NULL, maint_payments integer NULL, municipal_payments integer NULL, net_income_monthly double NULL, pension_contrib_monthly double NULL, salary_gross_monthly double NULL, salary_other_monthly double NULL, statutory_expenses_monthly double NULL, uif_contrib_monthly double NULL, sbcscore integer NULL, sbcscore_date varchar(50) NULL, source_id integer NULL, account_charge_off tinyint NULL, avg_account_age integer NULL, avg_amount_offered integer NULL, avg_bscore integer NULL, avg_bscore_range integer NULL, avg_credit_limit integer NULL, avg_hilife_balance integer NULL, avg_income integer NULL, has_bad_exclusion_score tinyint NULL, has_negative_block_code tinyint NULL, has_payment_projection_score tinyint NULL, has_staff_loan tinyint NULL, invalid_logo_1stloan tinyint NULL, mail_date varchar(50) NULL, mailing_indicator tinyint NULL, mailing_key tinyint NULL, max_account_age integer NULL, max_agreement_date varchar(50) NULL, max_amount_offered integer NULL, max_bscore integer NULL, max_bscore_date varchar(50) NULL, max_bscore_range integer NULL, max_credit_limit integer NULL, max_hilife_balance integer NULL, max_income integer NULL, max_open_date varchar(50) NULL, min_account_age integer NULL, min_agreement_date varchar(50) NULL, min_amount_offered integer NULL, min_bscore integer NULL, min_bscore_date varchar(50) NULL, min_bscore_range integer NULL, min_credit_limit integer NULL, min_hilife_balance integer NULL, min_income integer NULL, min_open_date varchar(50) NULL, no_accounts tinyint NULL, no_active_accounts tinyint NULL, no_inactive_accounts tinyint NULL, rcs_offer_campaign_no varchar(20) NULL, rcs_offer_expiry_date varchar(50) NULL, rcs_offer_interest_rate tinyint NULL, sum_contract_delinquency integer NULL, sum_payment_recency tinyint NULL, sum_return_mail integer NULL, sum_warning_code integer NULL, account_active char(1) NULL, account_age integer NULL, account_id bigint NULL, account_status char(3) NULL, agreement_date varchar(50) NULL, amount_offered double NULL, application_id varchar(20) NULL, block_code_1 varchar(20) NULL, block_code_2 varchar(20) NULL, bscore integer NULL, bscore_date varchar(50) NULL, bscore_range varchar(20) NULL, campaign_no varchar(20) NULL, contract_date varchar(50) NULL, contract_delinquency tinyint NULL, credit_limit double NULL, crlife_admin double NULL, current_balance double NULL, customer_id char(30) NULL, entity_id integer NULL, expiry_date varchar(50) NULL, group_id integer NULL, hilife_balance double NULL, income double NULL, installment_value double NULL, interest_rate double NULL, logo_id varchar(3) NULL, open_date varchar(50) NULL, palink_type varchar(20) NULL, payment_recency integer NULL, payout_amount double NULL, payout_date varchar(50) NULL, physical_address_id char(32) NULL, principal_value double NULL, prodcatid integer NULL, res_address_id char(32) NULL, return_mail integer NULL, risk_rating varchar(20) NULL, tag_date varchar(50) NULL, term_period tinyint NULL, warning_code tinyint NULL, warning_code1 integer NULL, warning_code2 integer NULL, warning_code3 integer NULL, write_off_balance double NULL, accno varchar(20) NULL, account_no varchar(20) NULL, address_class varchar(1) NULL, address_code varchar(3) NULL, address_id_ia varchar(30) NULL, address_line1 varchar(65) NULL, address_line2 varchar(65) NULL, address_line3 varchar(65) NULL, address_line4 varchar(65) NULL, address_line5 varchar(65) NULL, address_source_date varchar(50) NULL, address_type varchar(3) NULL, address_type_id char(1) NULL, amc varchar(15) NULL, amc_id varchar(10) NULL, building_type varchar(3) NULL, change_status varchar(3) NULL, clean_address_line1 varchar(65) NULL, clean_address_line2 varchar(65) NULL, clean_address_line3 varchar(65) NULL, clean_address_line4 varchar(65) NULL, clean_address_line5 varchar(65) NULL, country varchar(50) NULL, country_code varchar(6) NULL, delivery_code varchar(1) NULL, dpid varchar(20) NULL, lines_in tinyint NULL, lmc varchar(20) NULL, lmc_id varchar(15) NULL, location_type varchar(20) NULL, mpcode varchar(5) NULL, mpname varchar(36) NULL, postcode varchar(10) NULL, region varchar(40) NULL, sec_location_type varchar(20) NULL, spcode varchar(8) NULL, spname varchar(25) NULL, structure_status varchar(3) NULL, suburb varchar(65) NULL, suburb_ethnic_code varchar(2) NULL, suburb_income tinyint NULL, suburb_income_band varchar(2) NULL, suburb_lat double NULL, suburb_long double NULL, tag varchar(40) NULL, town varchar(65) NULL, umc varchar(15) NULL, validity_status varchar(3) NULL, prospect_right bigint null DEFAULT 0, has_email tinyint NULL DEFAULT 0, has_address tinyint NULL DEFAULT 0, has_phone tinyint NULL DEFAULT 0, has_home_phone tinyint NULL DEFAULT 0, has_cell_phone tinyint NULL DEFAULT 0, has_work_phone tinyint NULL DEFAULT 0, chargeoff_date varchar(50) NULL, mloans_index char(30) NULL, filler_space char(1) NULL DEFAULT ' ', filler_empty varchar(1) NULL DEFAULT '', curr_date varchar(50) NULL DEFAULT '', prospect_owner bigint NULL DEFAULT 0, reference_no varchar(20) NULL, c char(30) NULL DEFAULT '0' ) GO
Thanks for the help Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Wed, Aug 6, 2008 at 3:40 PM, Niels Nes
On Wed, Aug 06, 2008 at 03:16:34PM +0200, Stefan Manegold wrote:
On Wed, Aug 06, 2008 at 03:16:28AM -0700, Cronje Fourie wrote:
Hi Stefan...
All your assumptions are correct. - Both systems use a single SATA disk, SWAP and data partition on same disks - Compiled the MonetDB, from Jun-SuperTar, with options: --enable-sql --enable-optimise --nightly-stable - gcc etc are all latest on both machines, gcc 4.2.3 - compiled with 64bit OID's on 64bit machine. - I'm loading plain text data - Including the create table statement at the bottom, it's a whopper :) a couple of date fields are loaded as VARCHAR because the source data format is not compatible with MonetDB.
~ 200 attributes --- "impressive" ;-)
On the load side the VM usage went up to 11GB...on the 64bit system. I can't remember the exact value on the 32bit system but was around 3GB for VM.
Total VM usage of the system, or only of the mserver5 process?
Assuming that all data type you use occupy the same storage with both 32- & 64-bit systems (Niels might comment on this one), the (only?) difference between 32- & 64-bit systems is the size of OIDs and hash values used by MonetDB internally. By default, they are 32-bit on 32-bit systems and 64-bit on 64-bit systems --- the latter obviously occupying more (double the) space than the former. Given that the table doesn't have any 'key constraints', the loading should use the exact same amount of memory. Only during processing (queries or key constraint checking) the hash and oid values play a role. Indeed the other types should on 32 and 64 bit systems be of equal size.
Niels
This is becoming interesting:) Since Niels is saying that the 64-bit oid/hash values play no role during "copy into", then the problem might be indeed in Ubuntu vs KUbuntu versions, as Stefan mentioned. It will be nice to see what the results are with the 64-bit installation of ubuntu, but I believe that it will work fine since I am using also 64-bit ubuntu in my laptop and have never had problems. Officially the difference between ubuntu and kubuntu is only the window manager, ie. gnome vs kde.
At configure time, you can decide to build a 64-bit MonetDB with 32-bit OIDs (and hence hash values) (which --- in short --- will restrict tables to at most 2 billion records as on 32-bit systems --- but that should not be a problem given your 6 million records, right?) by calling configure with option "--enable-oid32". When using the monetdb-install.sh script, you have to resort to the "developers options/features" (cf., `monetdb-install.sh --devhelp`), e.g., as follows
EXTRA_ECONF=--enable-oid32 monetdb-install.sh ...
This might (should) reduce the memory footprint of mserver5 considerably.
One major difference between the two systems is that the 32bit system is a SERVER install, vs full KDE on the 64bit.
I'm now trying to run the same load on a 32bit system, which also has KDE installed and it is also much slower than on the server. Incidentally VM usage on this machine is 1GB
Hm, in case "server" vs. "KDE" make the difference, it's either that KDE consumes very (too?) much memory, leaving only little for other programs (as, e.g., MonetDB), or there's "something strange" going on that "blocks" MonetDB --- I would have no idea what that could be ...
I'm downloading the ubuntu server image at the moment and will install and test on that shortly, then get back to you with the results.
Yes, that would be great. I am / we are curious the hear what you find out!
Stefan
CREATE TABLE PrimalPool ( id10 char(10) NOT NULL, idno varchar(13) NOT NULL, cell_phone varchar(20) NULL, combined_expenses_monthly double NULL, combined_income_monthly double NULL, customer_age_id integer NULL, deceased tinyint NULL, demographic char(10) NULL, do_not_mail tinyint NULL, dob varchar(50) NULL, email varchar(150) NULL, emperica_score integer NULL, empscore_date varchar(50) NULL, ethnicity_id char(1) NULL, forename_1 varchar(20) NULL, forename_2 varchar(20) NULL, forename_3 varchar(20) NULL, gender_id char(1) NULL, home_language char(15) NULL, home_phone varchar(20) NULL, initials varchar(3) NULL, language char(1) NULL, marital_status_id char(1) NULL, market_sector_id tinyint NULL, prospect_age tinyint NULL, surname varchar(50) NULL, title_id varchar(20) NULL, work_phone varchar(20) NULL, bond_rent_payments integer NULL, bureau_installment_monthly double NULL, expense_bond_monthly double NULL, expense_maintenance_monthly double NULL, expense_municipal_services_monthly double NULL, income_tax_monthly double NULL, maint_payments integer NULL, municipal_payments integer NULL, net_income_monthly double NULL, pension_contrib_monthly double NULL, salary_gross_monthly double NULL, salary_other_monthly double NULL, statutory_expenses_monthly double NULL, uif_contrib_monthly double NULL, sbcscore integer NULL, sbcscore_date varchar(50) NULL, source_id integer NULL, account_charge_off tinyint NULL, avg_account_age integer NULL, avg_amount_offered integer NULL, avg_bscore integer NULL, avg_bscore_range integer NULL, avg_credit_limit integer NULL, avg_hilife_balance integer NULL, avg_income integer NULL, has_bad_exclusion_score tinyint NULL, has_negative_block_code tinyint NULL, has_payment_projection_score tinyint NULL, has_staff_loan tinyint NULL, invalid_logo_1stloan tinyint NULL, mail_date varchar(50) NULL, mailing_indicator tinyint NULL, mailing_key tinyint NULL, max_account_age integer NULL, max_agreement_date varchar(50) NULL, max_amount_offered integer NULL, max_bscore integer NULL, max_bscore_date varchar(50) NULL, max_bscore_range integer NULL, max_credit_limit integer NULL, max_hilife_balance integer NULL, max_income integer NULL, max_open_date varchar(50) NULL, min_account_age integer NULL, min_agreement_date varchar(50) NULL, min_amount_offered integer NULL, min_bscore integer NULL, min_bscore_date varchar(50) NULL, min_bscore_range integer NULL, min_credit_limit integer NULL, min_hilife_balance integer NULL, min_income integer NULL, min_open_date varchar(50) NULL, no_accounts tinyint NULL, no_active_accounts tinyint NULL, no_inactive_accounts tinyint NULL, rcs_offer_campaign_no varchar(20) NULL, rcs_offer_expiry_date varchar(50) NULL, rcs_offer_interest_rate tinyint NULL, sum_contract_delinquency integer NULL, sum_payment_recency tinyint NULL, sum_return_mail integer NULL, sum_warning_code integer NULL, account_active char(1) NULL, account_age integer NULL, account_id bigint NULL, account_status char(3) NULL, agreement_date varchar(50) NULL, amount_offered double NULL, application_id varchar(20) NULL, block_code_1 varchar(20) NULL, block_code_2 varchar(20) NULL, bscore integer NULL, bscore_date varchar(50) NULL, bscore_range varchar(20) NULL, campaign_no varchar(20) NULL, contract_date varchar(50) NULL, contract_delinquency tinyint NULL, credit_limit double NULL, crlife_admin double NULL, current_balance double NULL, customer_id char(30) NULL, entity_id integer NULL, expiry_date varchar(50) NULL, group_id integer NULL, hilife_balance double NULL, income double NULL, installment_value double NULL, interest_rate double NULL, logo_id varchar(3) NULL, open_date varchar(50) NULL, palink_type varchar(20) NULL, payment_recency integer NULL, payout_amount double NULL, payout_date varchar(50) NULL, physical_address_id char(32) NULL, principal_value double NULL, prodcatid integer NULL, res_address_id char(32) NULL, return_mail integer NULL, risk_rating varchar(20) NULL, tag_date varchar(50) NULL, term_period tinyint NULL, warning_code tinyint NULL, warning_code1 integer NULL, warning_code2 integer NULL, warning_code3 integer NULL, write_off_balance double NULL, accno varchar(20) NULL, account_no varchar(20) NULL, address_class varchar(1) NULL, address_code varchar(3) NULL, address_id_ia varchar(30) NULL, address_line1 varchar(65) NULL, address_line2 varchar(65) NULL, address_line3 varchar(65) NULL, address_line4 varchar(65) NULL, address_line5 varchar(65) NULL, address_source_date varchar(50) NULL, address_type varchar(3) NULL, address_type_id char(1) NULL, amc varchar(15) NULL, amc_id varchar(10) NULL, building_type varchar(3) NULL, change_status varchar(3) NULL, clean_address_line1 varchar(65) NULL, clean_address_line2 varchar(65) NULL, clean_address_line3 varchar(65) NULL, clean_address_line4 varchar(65) NULL, clean_address_line5 varchar(65) NULL, country varchar(50) NULL, country_code varchar(6) NULL, delivery_code varchar(1) NULL, dpid varchar(20) NULL, lines_in tinyint NULL, lmc varchar(20) NULL, lmc_id varchar(15) NULL, location_type varchar(20) NULL, mpcode varchar(5) NULL, mpname varchar(36) NULL, postcode varchar(10) NULL, region varchar(40) NULL, sec_location_type varchar(20) NULL, spcode varchar(8) NULL, spname varchar(25) NULL, structure_status varchar(3) NULL, suburb varchar(65) NULL, suburb_ethnic_code varchar(2) NULL, suburb_income tinyint NULL, suburb_income_band varchar(2) NULL, suburb_lat double NULL, suburb_long double NULL, tag varchar(40) NULL, town varchar(65) NULL, umc varchar(15) NULL, validity_status varchar(3) NULL, prospect_right bigint null DEFAULT 0, has_email tinyint NULL DEFAULT 0, has_address tinyint NULL DEFAULT 0, has_phone tinyint NULL DEFAULT 0, has_home_phone tinyint NULL DEFAULT 0, has_cell_phone tinyint NULL DEFAULT 0, has_work_phone tinyint NULL DEFAULT 0, chargeoff_date varchar(50) NULL, mloans_index char(30) NULL, filler_space char(1) NULL DEFAULT ' ', filler_empty varchar(1) NULL DEFAULT '', curr_date varchar(50) NULL DEFAULT '', prospect_owner bigint NULL DEFAULT 0, reference_no varchar(20) NULL, c char(30) NULL DEFAULT '0' ) GO
Thanks for the help Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 06-08-2008 17:40:40 +0200, Lefteris wrote:
This is becoming interesting:)
Since Niels is saying that the 64-bit oid/hash values play no role during "copy into", then the problem might be indeed in Ubuntu vs KUbuntu versions, as Stefan mentioned. It will be nice to see what the results are with the 64-bit installation of ubuntu, but I believe that it will work fine since I am using also 64-bit ubuntu in my laptop and have never had problems.
Officially the difference between ubuntu and kubuntu is only the window manager, ie. gnome vs kde.
It can be a kernel configuration. However, that doesn't sound like it can cause that large memory consumption problems.
Fabian Groffen wrote:
On 06-08-2008 17:40:40 +0200, Lefteris wrote:
This is becoming interesting:)
Since Niels is saying that the 64-bit oid/hash values play no role during "copy into", then the problem might be indeed in Ubuntu vs KUbuntu versions, as Stefan mentioned. It will be nice to see what the results are with the 64-bit installation of ubuntu, but I believe that it will work fine since I am using also 64-bit ubuntu in my laptop and have never had problems.
Officially the difference between ubuntu and kubuntu is only the window manager, ie. gnome vs kde.
It can be a kernel configuration. However, that doesn't sound like it can cause that large memory consumption problems.
Are we talking real hardware platforms? or virtual platforms running under VMware and friends?
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Had it on a REAL 64bit system. Now running on a 64bit VM within my 64bit ubuntu system. Martin Kersten wrote:
Fabian Groffen wrote:
On 06-08-2008 17:40:40 +0200, Lefteris wrote:
This is becoming interesting:)
Since Niels is saying that the 64-bit oid/hash values play no role during "copy into", then the problem might be indeed in Ubuntu vs KUbuntu versions, as Stefan mentioned. It will be nice to see what the results are with the 64-bit installation of ubuntu, but I believe that it will work fine since I am using also 64-bit ubuntu in my laptop and have never had problems.
Officially the difference between ubuntu and kubuntu is only the window manager, ie. gnome vs kde.
It can be a kernel configuration. However, that doesn't sound like it can cause that large memory consumption problems.
Are we talking real hardware platforms? or virtual platforms running under VMware and friends?
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://www.nabble.com/COPY-INTO-on-Ubuntu-64bit-is-slow-tp18833879p18868034.... Sent from the monetdb-users mailing list archive at Nabble.com.
Not sure what you mean with "Total VM usage or just mserver5 process" What I can tell you today is the following: - Created a VMWare workstation system under KUbuntu 64bit 8.04 - latest updates - VM Has - 1Processor, 2GB memory, 30GB disk space - OS = Ubuntu Server 64bit 8.04 with latest updates - Compiled monet from Jun Super Tar with --enable-optimise --enable-sql --nightly-stable - Created database with exact same scripts as used for 32bit systems - started copy into process using script... <code> copy 6157114 records into PrimalPool from '/home/cfourie/temp/PrimalPool.csv' delimiters '|', '\n' null as '' GO --score copy 12240201 records into score from '/home/cfourie/temp/score.csv' delimiters '|', '\n' null as '' GO --mailpot copy 10078100 records into mailpot from '/home/cfourie/temp/mailpot.csv' delimiters '|', '\n' null as '' GO </code> - Got back to the office this morning. The first copy of 6mil records was still running, mserver5 consuming 19GB VM according to top I'm stumped :) Next test - - keep existing ubuntu server install within VM - recompile monet same as before BUT with 32bitOID (2 billion record limit, I think we'll be ok for now :) - recreate DB same as before - attempt to load data again As a point of interest I'll enable memory debug and major IO mask. Maybe that gives me a pointer. Also related to this...Is there a known limit of the number of columns in a monetdb table? Could this be an issue? That will be my next step ;) -- View this message in context: http://www.nabble.com/COPY-INTO-on-Ubuntu-64bit-is-slow-tp18833879p18868004.... Sent from the monetdb-users mailing list archive at Nabble.com.
Having recompiled with 32bit OID's I was able to load the 6 million records in 4h23m. Resource usage on the VM IS - 7.5GB VIRT (I always think this is virtual memory, but it's virtual address space correct?) - 840MB RES System memory - PHYSICAL - 1.2GB used - VIRTUAL - 190MB I imagine the reason why the load was taking so long is due to the VM using 190MB virtual memory, which will kill performance. Still intrigued by the 64bit OID speed issue. But I guess double the VM usage space can result in exponential performance degradation. Cronje Cronje Fourie wrote:
Not sure what you mean with "Total VM usage or just mserver5 process"
What I can tell you today is the following: - Created a VMWare workstation system under KUbuntu 64bit 8.04 - latest updates - VM Has - 1Processor, 2GB memory, 30GB disk space - OS = Ubuntu Server 64bit 8.04 with latest updates - Compiled monet from Jun Super Tar with --enable-optimise --enable-sql --nightly-stable - Created database with exact same scripts as used for 32bit systems - started copy into process using script... <code> copy 6157114 records into PrimalPool from '/home/cfourie/temp/PrimalPool.csv' delimiters '|', '\n' null as '' GO --score copy 12240201 records into score from '/home/cfourie/temp/score.csv' delimiters '|', '\n' null as '' GO --mailpot copy 10078100 records into mailpot from '/home/cfourie/temp/mailpot.csv' delimiters '|', '\n' null as '' GO </code> - Got back to the office this morning. The first copy of 6mil records was still running, mserver5 consuming 19GB VM according to top
I'm stumped :)
Next test - - keep existing ubuntu server install within VM - recompile monet same as before BUT with 32bitOID (2 billion record limit, I think we'll be ok for now :) - recreate DB same as before - attempt to load data again
As a point of interest I'll enable memory debug and major IO mask. Maybe that gives me a pointer.
Also related to this...Is there a known limit of the number of columns in a monetdb table? Could this be an issue? That will be my next step ;)
-- View this message in context: http://www.nabble.com/COPY-INTO-on-Ubuntu-64bit-is-slow-tp18833879p18887127.... Sent from the monetdb-users mailing list archive at Nabble.com.
On Wed, Aug 06, 2008 at 03:16:28AM -0700, Cronje Fourie wrote:
Hi Stefan...
All your assumptions are correct. - Both systems use a single SATA disk, SWAP and data partition on same disks - Compiled the MonetDB, from Jun-SuperTar, with options: --enable-sql --enable-optimise --nightly-stable - gcc etc are all latest on both machines, gcc 4.2.3 - compiled with 64bit OID's on 64bit machine. - I'm loading plain text data - Including the create table statement at the bottom, it's a whopper :) a couple of date fields are loaded as VARCHAR because the source data format is not compatible with MonetDB.
On the load side the VM usage went up to 11GB...on the 64bit system. I can't remember the exact value on the 32bit system but was around 3GB for VM.
A part of the difference could be explained from the types. Varchar and char use atleast 8 bytes on the 64bit system (only 4 on 32 bit). Also the var heaps will take more space on the 64 bit system, because of the double elimination hash tables. Niels
One major difference between the two systems is that the 32bit system is a SERVER install, vs full KDE on the 64bit.
I'm now trying to run the same load on a 32bit system, which also has KDE installed and it is also much slower than on the server. Incidentally VM usage on this machine is 1GB
I'm downloading the ubuntu server image at the moment and will install and test on that shortly, then get back to you with the results.
CREATE TABLE PrimalPool ( id10 char(10) NOT NULL, idno varchar(13) NOT NULL, cell_phone varchar(20) NULL, combined_expenses_monthly double NULL, combined_income_monthly double NULL, customer_age_id integer NULL, deceased tinyint NULL, demographic char(10) NULL, do_not_mail tinyint NULL, dob varchar(50) NULL, email varchar(150) NULL, emperica_score integer NULL, empscore_date varchar(50) NULL, ethnicity_id char(1) NULL, forename_1 varchar(20) NULL, forename_2 varchar(20) NULL, forename_3 varchar(20) NULL, gender_id char(1) NULL, home_language char(15) NULL, home_phone varchar(20) NULL, initials varchar(3) NULL, language char(1) NULL, marital_status_id char(1) NULL, market_sector_id tinyint NULL, prospect_age tinyint NULL, surname varchar(50) NULL, title_id varchar(20) NULL, work_phone varchar(20) NULL, bond_rent_payments integer NULL, bureau_installment_monthly double NULL, expense_bond_monthly double NULL, expense_maintenance_monthly double NULL, expense_municipal_services_monthly double NULL, income_tax_monthly double NULL, maint_payments integer NULL, municipal_payments integer NULL, net_income_monthly double NULL, pension_contrib_monthly double NULL, salary_gross_monthly double NULL, salary_other_monthly double NULL, statutory_expenses_monthly double NULL, uif_contrib_monthly double NULL, sbcscore integer NULL, sbcscore_date varchar(50) NULL, source_id integer NULL, account_charge_off tinyint NULL, avg_account_age integer NULL, avg_amount_offered integer NULL, avg_bscore integer NULL, avg_bscore_range integer NULL, avg_credit_limit integer NULL, avg_hilife_balance integer NULL, avg_income integer NULL, has_bad_exclusion_score tinyint NULL, has_negative_block_code tinyint NULL, has_payment_projection_score tinyint NULL, has_staff_loan tinyint NULL, invalid_logo_1stloan tinyint NULL, mail_date varchar(50) NULL, mailing_indicator tinyint NULL, mailing_key tinyint NULL, max_account_age integer NULL, max_agreement_date varchar(50) NULL, max_amount_offered integer NULL, max_bscore integer NULL, max_bscore_date varchar(50) NULL, max_bscore_range integer NULL, max_credit_limit integer NULL, max_hilife_balance integer NULL, max_income integer NULL, max_open_date varchar(50) NULL, min_account_age integer NULL, min_agreement_date varchar(50) NULL, min_amount_offered integer NULL, min_bscore integer NULL, min_bscore_date varchar(50) NULL, min_bscore_range integer NULL, min_credit_limit integer NULL, min_hilife_balance integer NULL, min_income integer NULL, min_open_date varchar(50) NULL, no_accounts tinyint NULL, no_active_accounts tinyint NULL, no_inactive_accounts tinyint NULL, rcs_offer_campaign_no varchar(20) NULL, rcs_offer_expiry_date varchar(50) NULL, rcs_offer_interest_rate tinyint NULL, sum_contract_delinquency integer NULL, sum_payment_recency tinyint NULL, sum_return_mail integer NULL, sum_warning_code integer NULL, account_active char(1) NULL, account_age integer NULL, account_id bigint NULL, account_status char(3) NULL, agreement_date varchar(50) NULL, amount_offered double NULL, application_id varchar(20) NULL, block_code_1 varchar(20) NULL, block_code_2 varchar(20) NULL, bscore integer NULL, bscore_date varchar(50) NULL, bscore_range varchar(20) NULL, campaign_no varchar(20) NULL, contract_date varchar(50) NULL, contract_delinquency tinyint NULL, credit_limit double NULL, crlife_admin double NULL, current_balance double NULL, customer_id char(30) NULL, entity_id integer NULL, expiry_date varchar(50) NULL, group_id integer NULL, hilife_balance double NULL, income double NULL, installment_value double NULL, interest_rate double NULL, logo_id varchar(3) NULL, open_date varchar(50) NULL, palink_type varchar(20) NULL, payment_recency integer NULL, payout_amount double NULL, payout_date varchar(50) NULL, physical_address_id char(32) NULL, principal_value double NULL, prodcatid integer NULL, res_address_id char(32) NULL, return_mail integer NULL, risk_rating varchar(20) NULL, tag_date varchar(50) NULL, term_period tinyint NULL, warning_code tinyint NULL, warning_code1 integer NULL, warning_code2 integer NULL, warning_code3 integer NULL, write_off_balance double NULL, accno varchar(20) NULL, account_no varchar(20) NULL, address_class varchar(1) NULL, address_code varchar(3) NULL, address_id_ia varchar(30) NULL, address_line1 varchar(65) NULL, address_line2 varchar(65) NULL, address_line3 varchar(65) NULL, address_line4 varchar(65) NULL, address_line5 varchar(65) NULL, address_source_date varchar(50) NULL, address_type varchar(3) NULL, address_type_id char(1) NULL, amc varchar(15) NULL, amc_id varchar(10) NULL, building_type varchar(3) NULL, change_status varchar(3) NULL, clean_address_line1 varchar(65) NULL, clean_address_line2 varchar(65) NULL, clean_address_line3 varchar(65) NULL, clean_address_line4 varchar(65) NULL, clean_address_line5 varchar(65) NULL, country varchar(50) NULL, country_code varchar(6) NULL, delivery_code varchar(1) NULL, dpid varchar(20) NULL, lines_in tinyint NULL, lmc varchar(20) NULL, lmc_id varchar(15) NULL, location_type varchar(20) NULL, mpcode varchar(5) NULL, mpname varchar(36) NULL, postcode varchar(10) NULL, region varchar(40) NULL, sec_location_type varchar(20) NULL, spcode varchar(8) NULL, spname varchar(25) NULL, structure_status varchar(3) NULL, suburb varchar(65) NULL, suburb_ethnic_code varchar(2) NULL, suburb_income tinyint NULL, suburb_income_band varchar(2) NULL, suburb_lat double NULL, suburb_long double NULL, tag varchar(40) NULL, town varchar(65) NULL, umc varchar(15) NULL, validity_status varchar(3) NULL, prospect_right bigint null DEFAULT 0, has_email tinyint NULL DEFAULT 0, has_address tinyint NULL DEFAULT 0, has_phone tinyint NULL DEFAULT 0, has_home_phone tinyint NULL DEFAULT 0, has_cell_phone tinyint NULL DEFAULT 0, has_work_phone tinyint NULL DEFAULT 0, chargeoff_date varchar(50) NULL, mloans_index char(30) NULL, filler_space char(1) NULL DEFAULT ' ', filler_empty varchar(1) NULL DEFAULT '', curr_date varchar(50) NULL DEFAULT '', prospect_owner bigint NULL DEFAULT 0, reference_no varchar(20) NULL, c char(30) NULL DEFAULT '0' ) GO
Thanks for the help Cronje
Stefan Manegold wrote:
On Tue, Aug 05, 2008 at 05:25:09PM +0200, Cronje Fourie wrote:
Hi guys....
I've got to machines running Monet. 1. Quad core Intel with 4GB RAM, Ubuntu 8.04 32BIT (server ed) 2. AMD64 Quad Core, 4GB RAM, KUbuntu 8.04 64BIT (kde 3.5.9)
On system 1 I can do a "copy into" of 6million records in no time at all :)
On system 2 the same function, with the exact same data takes forever.
Any thoughts on the reason? Is this a memory related issue?
I assume that you configure MonetDB on both machines with the same (if any) optimization and/or debugging options and compiled it with the same (versions of) tools (in particular compiler).
Morover, I assume that you built a 64-bit version with 64-bit OIDs on the 64-bit machine (should be the default on a 64-bit machines).
Finally, I assume that both machines have comparable I/O systems (disks).
Did/can you check how big your mserver5 process is growing on both machines during the copy into, and/or whether either machines (obviously the second one) is swapping heavily during the copy into?
How wide are your records, i.e., what does your 'create table' statement look like?
Do you load from a plain (uncompressed) file or from a compressed (how) file?
Stefan
Cronje
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://www.nabble.com/COPY-INTO-on-Ubuntu-64bit-is-slow-tp18833879p18848282.... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (7)
-
Cronje Fourie
-
Cronje Fourie
-
Fabian Groffen
-
Lefteris
-
Martin Kersten
-
Niels Nes
-
Stefan Manegold