Hello, (google translate) I have a query that takes a long time : select count(distinct idvisiteur) from src_inbox__at__lm__visites; After two hours of running, it is not over , I'll kill her . if I look at the heapsize column storage ( ) sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+=============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 01/00/10025 | 315819936 | 5 | 2526559488 | 27755937792 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 76/7667 | 315819936 | 19 | 2526559488 | 80449110016 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 62/6201 | 315819936 | 2 | 1263279744 | 2222391296 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 77/7704 | 315819936 | 5 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 30/3064 | 315819936 | 19 | 2526559488 | 10741678080 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 31/3171 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 67/6706 | 315819936 | 12 | 1263279744 | 2077556736 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 40/4071 | 315819936 | 10 | 2526559488 | 22635741184 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 01/26/12673 | 315819936 | 8 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 01/44/14410 | 315819936 | 17 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 42/4234 | 315819936 | 1 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 01/41/14100 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/22/12277 | 315819936 | 0 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 65/6573 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/04/10434 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/22/12264 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 56/5615 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/02/10211 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/23/12370 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+ Is the heapsize column represents the amount of data to be placed in RAM ? I have 64GB of RAM on the server , so the idvisiteur column can not be held in RAM. Is this true? ( 64 000 000 000 < 80 449 110 016 ) I execute a vacuum (experimental command ?) : https://www.monetdb.org/Documentation/UserGuide/AutoloadingScript (Table vacuum commands (experimental!)) call vacuum('sys','src_inbox__at__lm__visites'); sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 36 | 315819936 | 5 | 2526559488 | 4906811392 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 64/6435 | 315819936 | 19 | 1263279744 | 3667394560 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 66/6640 | 315819936 | 2 | 1263279744 | 40042496 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 67/6743 | 315819936 | 5 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 70/7045 | 315819936 | 19 | 2526559488 | 7576551424 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 71/7152 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 72/7252 | 315819936 | 12 | 1263279744 | 106561536 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 73/7353 | 315819936 | 10 | 1263279744 | 1362165760 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 74/7453 | 315819936 | 8 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 75/7554 | 315819936 | 17 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 76/7655 | 315819936 | 1 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 77/7761 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/00/10064 | 315819936 | 0 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 01/01/10165 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/02/10270 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/03/10372 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 01/04/10476 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/06/10600 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/07/10703 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+ heapsize of idvisteur = 3 667 394 560 (3,6GB) instead of 80 449 110 016 (80 GB) my query is present MUCH faster sql>select count(distinct idvisiteur) from src_inbox__at__lm__visites; +----------+ | L1 | +==========+ | 75528081 | +----------+ 1 tuple (4m 22s) HOURA !!! but WHY WHY WHY ? I want to understand Thank Pierre -- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35
Hello, I would guess you are doing a lot of deletes from the table. Monet does not delete rows physically but merely marks them as deleted. Vacuum command removes deleted rows physically and table scan does not go over 80 GB but over 3.6 GB. Hope that helps, Radovan On 10/22/2014 11:44 PM, Pierre-Adrien Coustillas wrote:
Hello,
(google translate)
I have a query that takes a long time : select count(distinct idvisiteur) from src_inbox__at__lm__visites;
After two hours of running, it is not over, I'll kill her.
if I look at the heapsize column storage () sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+=============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 01/00/10025 | 315819936 | 5 | 2526559488 | 27755937792 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 76/7667 | 315819936 | 19 | 2526559488 | 80449110016 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 62/6201 | 315819936 | 2 | 1263279744 | 2222391296 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 77/7704 | 315819936 | 5 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 30/3064 | 315819936 | 19 | 2526559488 | 10741678080 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 31/3171 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 67/6706 | 315819936 | 12 | 1263279744 | 2077556736 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 40/4071 | 315819936 | 10 | 2526559488 | 22635741184 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 01/26/12673 | 315819936 | 8 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 01/44/14410 | 315819936 | 17 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 42/4234 | 315819936 | 1 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 01/41/14100 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/22/12277 | 315819936 | 0 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 65/6573 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/04/10434 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/22/12264 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 56/5615 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/02/10211 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/23/12370 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+
Is the heapsize column represents the amount of data to be placed in RAM ? I have 64GB of RAM on the server, so the idvisiteur column can not be held in RAM. Is this true? ( 64 000 000 000 < 80 449 110 016 )
I execute a vacuum (experimental command ?) : https://www.monetdb.org/Documentation/UserGuide/AutoloadingScript (Table vacuum commands (experimental!)) call vacuum('sys','src_inbox__at__lm__visites');
sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 36 | 315819936 | 5 | 2526559488 | 4906811392 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 64/6435 | 315819936 | 19 | 1263279744 | 3667394560 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 66/6640 | 315819936 | 2 | 1263279744 | 40042496 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 67/6743 | 315819936 | 5 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 70/7045 | 315819936 | 19 | 2526559488 | 7576551424 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 71/7152 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 72/7252 | 315819936 | 12 | 1263279744 | 106561536 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 73/7353 | 315819936 | 10 | 1263279744 | 1362165760 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 74/7453 | 315819936 | 8 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 75/7554 | 315819936 | 17 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 76/7655 | 315819936 | 1 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 77/7761 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/00/10064 | 315819936 | 0 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 01/01/10165 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/02/10270 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/03/10372 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 01/04/10476 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/06/10600 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/07/10703 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+
heapsize of idvisteur = 3 667 394 560 (3,6GB) instead of 80 449 110 016 (80 GB)
my query is present MUCH faster sql>select count(distinct idvisiteur) from src_inbox__at__lm__visites; +----------+ | L1 | +==========+ | 75528081 | +----------+ 1 tuple (4m 22s)
HOURA !!!
but WHY WHY WHY ? I want to understand
Thank
Pierre
-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi, in addition to Radovan's perfect answer (Thanks!!), please also see https://www.monetdb.org/Documentation/Manuals/SQLreference/Transactions about transactions and deletes (and yes, the vacuum operation is (for now) still *experimental*) and https://www.monetdb.org/Documentation/MemoryClaim about memory usage and requirements. Best, Stefan ----- Original Message -----
Hello, I would guess you are doing a lot of deletes from the table. Monet does not delete rows physically but merely marks them as deleted. Vacuum command removes deleted rows physically and table scan does not go over 80 GB but over 3.6 GB. Hope that helps, Radovan
On 10/22/2014 11:44 PM, Pierre-Adrien Coustillas wrote:
Hello,
(google translate)
I have a query that takes a long time : select count(distinct idvisiteur) from src_inbox__at__lm__visites;
After two hours of running, it is not over, I'll kill her.
if I look at the heapsize column storage () sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+=============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 01/00/10025 | 315819936 | 5 | 2526559488 | 27755937792 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 76/7667 | 315819936 | 19 | 2526559488 | 80449110016 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 62/6201 | 315819936 | 2 | 1263279744 | 2222391296 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 77/7704 | 315819936 | 5 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 30/3064 | 315819936 | 19 | 2526559488 | 10741678080 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 31/3171 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 67/6706 | 315819936 | 12 | 1263279744 | 2077556736 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 40/4071 | 315819936 | 10 | 2526559488 | 22635741184 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 01/26/12673 | 315819936 | 8 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 01/44/14410 | 315819936 | 17 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 42/4234 | 315819936 | 1 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 01/41/14100 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/22/12277 | 315819936 | 0 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 65/6573 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/04/10434 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/22/12264 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 56/5615 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/02/10211 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/23/12370 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+
Is the heapsize column represents the amount of data to be placed in RAM ? I have 64GB of RAM on the server, so the idvisiteur column can not be held in RAM. Is this true? ( 64 000 000 000 < 80 449 110 016 )
I execute a vacuum (experimental command ?) : https://www.monetdb.org/Documentation/UserGuide/AutoloadingScript (Table vacuum commands (experimental!)) call vacuum('sys','src_inbox__at__lm__visites');
sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 36 | 315819936 | 5 | 2526559488 | 4906811392 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 64/6435 | 315819936 | 19 | 1263279744 | 3667394560 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 66/6640 | 315819936 | 2 | 1263279744 | 40042496 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 67/6743 | 315819936 | 5 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 70/7045 | 315819936 | 19 | 2526559488 | 7576551424 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 71/7152 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 72/7252 | 315819936 | 12 | 1263279744 | 106561536 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 73/7353 | 315819936 | 10 | 1263279744 | 1362165760 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 74/7453 | 315819936 | 8 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 75/7554 | 315819936 | 17 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 76/7655 | 315819936 | 1 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 77/7761 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/00/10064 | 315819936 | 0 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 01/01/10165 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/02/10270 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/03/10372 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 01/04/10476 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/06/10600 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/07/10703 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+
heapsize of idvisteur = 3 667 394 560 (3,6GB) instead of 80 449 110 016 (80 GB)
my query is present MUCH faster sql>select count(distinct idvisiteur) from src_inbox__at__lm__visites; +----------+ | L1 | +==========+ | 75528081 | +----------+ 1 tuple (4m 22s)
HOURA !!!
but WHY WHY WHY ? I want to understand
Thank
Pierre
-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hello,
On the table I just "drop " and a "create as select " .
I do not perform delete
DROP TABLE src_inbox__at__lm__visites ;
CREATE TABLE AS src_inbox__at__lm__visites (
SELECT SRC . * BDM.id_unique
FROM CBC src__at__lm__visites
INNER JOIN bdm_index_client_cookies BDM
ON = SRC.idvisiteur BDM.idvisiteur
AND BDM.id_univers = 1) WITH DATA ;
this script is run every night
how to explain the fragmentation?
--
1G6
52 route de bischwiller
67300 Schiltigheim
Société de Services et de Formations en Logiciels Libres
http://1g6.biz
Tél : 06 64 63 70 35
----- Mail original -----
De: "Stefan Manegold"
Hello, I would guess you are doing a lot of deletes from the table. Monet does not delete rows physically but merely marks them as deleted. Vacuum command removes deleted rows physically and table scan does not go over 80 GB but over 3.6 GB. Hope that helps, Radovan
On 10/22/2014 11:44 PM, Pierre-Adrien Coustillas wrote:
Hello,
(google translate)
I have a query that takes a long time : select count(distinct idvisiteur) from src_inbox__at__lm__visites;
After two hours of running, it is not over, I'll kill her.
if I look at the heapsize column storage () sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+=============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 01/00/10025 | 315819936 | 5 | 2526559488 | 27755937792 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 76/7667 | 315819936 | 19 | 2526559488 | 80449110016 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 62/6201 | 315819936 | 2 | 1263279744 | 2222391296 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 77/7704 | 315819936 | 5 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 30/3064 | 315819936 | 19 | 2526559488 | 10741678080 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 31/3171 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 67/6706 | 315819936 | 12 | 1263279744 | 2077556736 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 40/4071 | 315819936 | 10 | 2526559488 | 22635741184 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 01/26/12673 | 315819936 | 8 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 01/44/14410 | 315819936 | 17 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 42/4234 | 315819936 | 1 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 01/41/14100 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/22/12277 | 315819936 | 0 | 315819936 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 65/6573 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/04/10434 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/22/12264 | 315819936 | 0 | 631639872 | 65536 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 56/5615 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/02/10211 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/23/12370 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+
Is the heapsize column represents the amount of data to be placed in RAM ? I have 64GB of RAM on the server, so the idvisiteur column can not be held in RAM. Is this true? ( 64 000 000 000 < 80 449 110 016 )
I execute a vacuum (experimental command ?) : https://www.monetdb.org/Documentation/UserGuide/AutoloadingScript (Table vacuum commands (experimental!)) call vacuum('sys','src_inbox__at__lm__visites');
sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites'; +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+============================+=====================+=========+=============+===========+===========+============+============+=========+========+ | sys | src_inbox__at__lm__visites | idvisite | varchar | 36 | 315819936 | 5 | 2526559488 | 4906811392 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteur | varchar | 64/6435 | 315819936 | 19 | 1263279744 | 3667394560 | 0 | false | | sys | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 66/6640 | 315819936 | 2 | 1263279744 | 40042496 | 0 | false | | sys | src_inbox__at__lm__visites | siteniveau1 | varchar | 67/6743 | 315819936 | 5 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | debutvisite | varchar | 70/7045 | 315819936 | 19 | 2526559488 | 7576551424 | 0 | false | | sys | src_inbox__at__lm__visites | catvisiteur | varchar | 71/7152 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | region | varchar | 72/7252 | 315819936 | 12 | 1263279744 | 106561536 | 0 | false | | sys | src_inbox__at__lm__visites | ville | varchar | 73/7353 | 315819936 | 10 | 1263279744 | 1362165760 | 0 | false | | sys | src_inbox__at__lm__visites | os | varchar | 74/7453 | 315819936 | 8 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | browser | varchar | 75/7554 | 315819936 | 17 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | sourcevisite | varchar | 76/7655 | 315819936 | 1 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | idcampagne | varchar | 77/7761 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | prefixe | varchar | 01/00/10064 | 315819936 | 0 | 315819936 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | libcampagne | varchar | 01/01/10165 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | emailing | varchar | 01/02/10270 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | dateenvois | varchar | 01/03/10372 | 315819936 | 0 | 631639872 | 73728 | 0 | false | | sys | src_inbox__at__lm__visites | nbpages | int | 01/04/10476 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | dureevisite | time | 01/06/10600 | 315819936 | 4 | 1263279744 | 0 | 0 | false | | sys | src_inbox__at__lm__visites | id_unique | int | 01/07/10703 | 315819936 | 4 | 1263279744 | 0 | 0 | false | +--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+
heapsize of idvisteur = 3 667 394 560 (3,6GB) instead of 80 449 110 016 (80 GB)
my query is present MUCH faster sql>select count(distinct idvisiteur) from src_inbox__at__lm__visites; +----------+ | L1 | +==========+ | 75528081 | +----------+ 1 tuple (4m 22s)
HOURA !!!
but WHY WHY WHY ? I want to understand
Thank
Pierre
-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Pierre-Adrien Coustillas
-
Radovan Bičiště
-
Stefan Manegold