[Low level] Understanding duplicate entries in delta BAT (delete BAT)
Hi, As per MonetDB Solution suggestion, we are asking here about this topic. We are studying how MonetDB store its database on disk. So far we were able to understand pretty much all the files (BAT, journal, BBP.dir, heap, hash, imprints, and so on). We have custom tools to decode and dump them (experimental at this point). There is one thing however that is unclear: we thought that "delta BAT", which are the BAT assigned to an SQL tables to list the OIDs of rows that were deleted (named "D_<schema>_<table>" internaly), would contains only *uniques* values (since you can never delete twice the same row). But on several databases that we are running, we found that some of theses BATs contain duplicates. Lot of duplicates actually. Especially on D_sys__columns and D_sys__tables BATs (respectively for the sys._columns and sys._tables system tables). Some databases do not have this "issue" (while they all have the same schema, and process the same kind of data as the other ones). Can someone explain if duplicates are expected in theses BATs ? Here is an excerpt of D_sys__tables: # hexdump -C 02/210.tail 00000000 2e 00 00 00 00 00 00 00 2f 00 00 00 00 00 00 00 |......../.......| 00000010 30 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |0.......1.......| 00000020 32 00 00 00 00 00 00 00 33 00 00 00 00 00 00 00 |2.......3.......| 00000030 34 00 00 00 00 00 00 00 35 00 00 00 00 00 00 00 |4.......5.......| 00000040 36 00 00 00 00 00 00 00 37 00 00 00 00 00 00 00 |6.......7.......| 00000050 42 00 00 00 00 00 00 00 43 00 00 00 00 00 00 00 |B.......C.......| 00000060 44 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 |D.......E.......| 00000070 46 00 00 00 00 00 00 00 47 00 00 00 00 00 00 00 |F.......G.......| 00000080 48 00 00 00 00 00 00 00 49 00 00 00 00 00 00 00 |H.......I.......| 00000090 4a 00 00 00 00 00 00 00 4b 00 00 00 00 00 00 00 |J.......K.......| 000000a0 4c 00 00 00 00 00 00 00 4d 00 00 00 00 00 00 00 |L.......M.......| 000000b0 4e 00 00 00 00 00 00 00 4f 00 00 00 00 00 00 00 |N.......O.......| 000000c0 50 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |P.......P.......| 000000d0 51 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |Q.......P.......| 000000e0 51 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 |Q.......R.......| 000000f0 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q.......| 00000100 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S.......| 00000110 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q.......| 00000120 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S.......| ... You can clearly see duplicates OID (the first one being 0x50 at 0xc8). Its entry in BBP.dir (split into sections): 136 32 tmp_210 tmpr_210 02/210 610523782 2 171807 0 0 171807 172032 0 0 0 0 void 0 1 1793 0 0 0 0 0 1000651 0 0 0 oid 8 0 1024 24 25 27 1 46 773603235 1374456 1376256 1 We are using: MonetDB 5 server v11.21.14 (64-bit, 64-bit oids, 128-bit integers). -- Frédéric Jolliton Sécuractive
Le 18/02/2016 13:06, Frédéric Jolliton a écrit :
Can someone explain if duplicates are expected in theses BATs ?
Hi all, Please,does somebody know if having duplicate entries in a DEL_bat is an issue or not ? At least it seems unnecessary. If yes, we propose to investigate deeper. Thanks. -- Guillaume Savary Securactive R&D
On Wed, Feb 24, 2016 at 09:35:36AM +0100, Guillaume Savary wrote:
Le 18/02/2016 13:06, Frédéric Jolliton a écrit :
Can someone explain if duplicates are expected in theses BATs ?
Hi all,
Please,does somebody know if having duplicate entries in a DEL_bat is an issue or not ? At least it seems unnecessary. If yes, we propose to investigate deeper.
The D_bats hold row id's. As we currently don't recycle rows, it seems strange to have the same (allready deleted) row in there again. Niels
Thanks.
-- Guillaume Savary Securactive R&D
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl
Hi,
I have a test script that causes duplicate table for 4 months. Since updating Jun2016-SP1 I have no worries.
Maybe this correction: https://www.monetdb.org/bugzilla/show_bug.cgi?id=4036
And you ?
Pierre
----- Mail original -----
De: "Niels Nes"
Le 18/02/2016 13:06, Frédéric Jolliton a écrit :
Can someone explain if duplicates are expected in theses BATs ?
Hi all,
Please,does somebody know if having duplicate entries in a DEL_bat is an issue or not ? At least it seems unnecessary. If yes, we propose to investigate deeper.
The D_bats hold row id's. As we currently don't recycle rows, it seems strange to have the same (allready deleted) row in there again. Niels
Thanks.
-- Guillaume Savary Securactive R&D
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Frédéric Jolliton
-
Guillaume Savary
-
Niels Nes
-
Pierre-Adrien Coustillas