Hai Ariel,
Just a side question, why are you creating msqldump of your data to reload into MonetDB. Why don’t you immediately prepare CSV data and do the bulk loading?
Regards,
Jennie
> On 31 Mar 2020, at 18:50, Ariel Abadi <aabadi@starbi.com> wrote:
>
> Hi!
>
> Firstly, I would like to thank for the prompt answer!!!
>
> I will wait your answer Panos, to see wether is a bug or not.
>
> In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation (https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles".
> Also, it would be great to use the OFFSET clause disregards of the format of the remainder.
>
> Again, thank you very much to both!!!
>
> Ariel
> PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly
>
>
> On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis <panagiotis.koutsourakis@monetdbsolutions.com> wrote:
> Hi Arjen,
>
> please see my more recent email about the subject.
>
> Best regards,
> Panos.
>
> On 3/31/20 11:13 AM, Arjen P. de Vries wrote:
> > Hi Panos,
> >
> > Great assistance here!
> >
> > One thing:
> > I can understand why Ariel tried to use OFFSET for that purpose, maybe nice
> > to add to the documentation that skipped lines also should follow the
> > format of the remainder!
> >
> > Greetings,
> >
> > Arjen
> >
> > On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis <
> > panagiotis.koutsourakis@monetdbsolutions.com> wrote:
> >
> >> Hi Ariel,
> >>
> >> The COPY INTO command recognizes a series of records of the form:
> >>
> >> <data><field delimiter><data><field delimiter>...<data><record delimiter>
> >>
> >> For example the command
> >>
> >> COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
> >>
> >> will recognize lines of the form:
> >>
> >> 1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
> >>
> >> but nothing else. Specifically the first 222 lines in the monet_list.txt
> >> file be rejected by COPY INTO. You need to delete those lines before you
> >> try to load data with copy into.
> >>
> >> Take a look at the documentation of COPY INTO in the MonetDB web page [1].
> >>
> >> Hope this helps,
> >> Panos.
> >>
> >> [1]
> >> https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
> >>
> >> On 3/30/20 7:42 PM, Ariel Abadi wrote:
> >>> Hi Panos!
> >>>
> >>> Enclosed you can find the file (truncated) that I used and the statement
> >> I
> >>> run on mclient.
> >>>
> >>> COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt'
> >>> USING DELIMITERS '\t','\n','"';
> >>>
> >>>
> >>> Thks
> >>> Ariel
> >>>
> >>>
> >>> On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis <
> >>> panagiotis.koutsourakis@monetdbsolutions.com> wrote:
> >>>
> >>>> Hi Ariel,
> >>>>
> >>>> The first issue is not totally surprising to me. My best guess is that
> >>>> reading from a file is using buffered input while reading from stdin is
> >>>> not. See for example [1] and [2]. I am not sure there is a lot we can
> >> do at
> >>>> the MonetDB side about this.
> >>>>
> >>>> About the second issue could you please post the exact COPY INTO command
> >>>> and the first few lines of the file you are trying to load so that we
> >> can
> >>>> figure out the exact issue?
> >>>>
> >>>> Best regards,
> >>>> Panos.
> >>>>
> >>>> [1]
> >>>>
> >> https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python
> >>>> [2]
> >>>>
> >> https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/17233
> >>>>
> >>>> On 3/26/20 11:51 AM, Ariel Abadi wrote:
> >>>>> Hi!
> >>>>>
> >>>>> After a big battle against the DB, I found 2 issues.
> >>>>>
> >>>>> 1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally
> >>>>> different in terms of performance. While the first one last 24 minutes,
> >>>> the
> >>>>> second one last only *90 secs.*
> >>>>> 2) As I had generated all those files thru msqldump (which it states
> >> the
> >>>>> "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
> >> the
> >>>>> COPY *XXXX OFFSET *RECORDS (in order to avoid the create table
> >>>>> statement)... and the query, because of the word "STDIN" in the file.
> >> It
> >>>>> was not OFFSETTING all the rows I was asking for.
> >>>>>
> >>>>> I hope I've been clear... if not please let me know!
> >>>>>
> >>>>> Ariel
> >>>>>
> >>>>> On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi
> >>>>>>
> >>>>>> I am not sure what issue you are now dealing with, but since you are a
> >>>>>> company
> >>>>>> that seems to use (or considering) MonetDB in your commercial
> >> offerings,
> >>>>>> I gladly refer you to MonetDBSolutions(*), where we can give you the
> >>>>>> commercial
> >>>>>> support needed to diagnose your issues.
> >>>>>>
> >>>>>> regards Martin
> >>>>>>
> >>>>>> (*) https://www.monetdbsolutions.com/
> >>>>>>
> >>>>>> On 24/03/2020 23:55, Ariel Abadi wrote:
> >>>>>>> Thanks Martin for your prompt response
> >>>>>>> But I did and msqldump from other database, and now the upload is
> >> being
> >>>>>> done thru the BULK LOAD.
> >>>>>>>
> >>>>>>> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING
> >>>>>> DELIMITERS '\t','\n','"';/
> >>>>>>>
> >>>>>>>
> >>>>>>> Any other idea ? :'(
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <
> >> martin.kersten@cwi.nl
> >>>>>> <mailto:martin.kersten@cwi.nl>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> given the numbers I suspect you are loading them as SQL insert
> >>>>>> statements under autocommit mode,
> >>>>>>> Please read the documentation on Bulk loading.
> >>>>>>>
> >>>>>>> regards,Martin
> >>>>>>> Sent from my iPad
> >>>>>>>
> >>>>>>>> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com
> >> <mailto:
> >>>>>> aabadi@starbi.com>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi!
> >>>>>>>> Maybe somebody can help me, please!
> >>>>>>>>
> >>>>>>>> I'm trying to upload into Monet a file with 14MM records. Is
> >>>> taking
> >>>>>> more than *24 minutes.*
> >>>>>>>> It is a fresh new database.
> >>>>>>>>
> >>>>>>>> Enclosed you will find the server configuration.
> >>>>>>>>
> >>>>>>>> Monetdb Version Nov2019-SP3
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *[root@mn tmp]$ lscpu*
> >>>>>>>> Architecture: x86_64
> >>>>>>>> CPU op-mode(s): 32-bit, 64-bit
> >>>>>>>> Byte Order: Little Endian
> >>>>>>>> CPU(s): 40
> >>>>>>>> On-line CPU(s) list: 0-39
> >>>>>>>> Thread(s) per core: 2
> >>>>>>>> Core(s) per socket: 10
> >>>>>>>> Socket(s): 2
> >>>>>>>> NUMA node(s): 2
> >>>>>>>> Vendor ID: GenuineIntel
> >>>>>>>> CPU family: 6
> >>>>>>>> Model: 85
> >>>>>>>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @
> >> 2.20GHz
> >>>>>>>> Stepping: 4
> >>>>>>>> CPU MHz: 800.479
> >>>>>>>> BogoMIPS: 4400.00
> >>>>>>>> Virtualization: VT-x
> >>>>>>>> L1d cache: 32K
> >>>>>>>> L1i cache: 32K
> >>>>>>>> L2 cache: 1024K
> >>>>>>>> L3 cache: 14080K
> >>>>>>>> NUMA node0 CPU(s):
> >>>>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
> >>>>>>>> NUMA node1 CPU(s):
> >>>>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
> >>>>>>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic
> >> sep
> >>>>>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> >> tm
> >>>> pbe
> >>>>>> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> >>>>>> rep_good nopl xtopology nonstop_tsc
> >>>>>>>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
> >>>>>> tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
> >>>>>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
> >>>>>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
> >>>>>>>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp
> >> tpr_shadow
> >>>>>> vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2
> >> smep
> >>>>>> bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
> >>>>>> clflushopt clwb intel_pt avx512cd avx512bw
> >>>>>>>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
> >>>>>> cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear
> >>>>>> flush_l1d
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *[root@mn tmp]$ cat /proc/meminfo*
> >>>>>>>>
> >>>>>>>> MemTotal: 83886080 kB
> >>>>>>>> MemFree: 10535812 kB
> >>>>>>>> MemAvailable: 80159744 kB
> >>>>>>>> Buffers: 0 kB
> >>>>>>>> Cached: 69623932 kB
> >>>>>>>> SwapCached: 0 kB
> >>>>>>>> Active: 6625436 kB
> >>>>>>>> Inactive: 63737872 kB
> >>>>>>>> Active(anon): 324456 kB
> >>>>>>>> Inactive(anon): 417184 kB
> >>>>>>>> Active(file): 6300980 kB
> >>>>>>>> Inactive(file): 63320688 kB
> >>>>>>>> Unevictable: 0 kB
> >>>>>>>> Mlocked: 5320 kB
> >>>>>>>> SwapTotal: 8388604 kB
> >>>>>>>> SwapFree: 8381988 kB
> >>>>>>>> Dirty: 220 kB
> >>>>>>>> Writeback: 0 kB
> >>>>>>>> AnonPages: 13626052 kB
> >>>>>>>> Mapped: 186368 kB
> >>>>>>>> Shmem: 1800 kB
> >>>>>>>> KReclaimable: 4134960 kB
> >>>>>>>> Slab: 0 kB
> >>>>>>>> SReclaimable: 0 kB
> >>>>>>>> SUnreclaim: 0 kB
> >>>>>>>> KernelStack: 16720 kB
> >>>>>>>> PageTables: 56500 kB
> >>>>>>>> NFS_Unstable: 0 kB
> >>>>>>>> Bounce: 0 kB
> >>>>>>>> WritebackTmp: 0 kB
> >>>>>>>> CommitLimit: 74123472 kB
> >>>>>>>> Committed_AS: 18337812 kB
> >>>>>>>> VmallocTotal: 34359738367 kB
> >>>>>>>> VmallocUsed: 415836 kB
> >>>>>>>> VmallocChunk: 0 kB
> >>>>>>>> Percpu: 39552 kB
> >>>>>>>> HardwareCorrupted: 0 kB
> >>>>>>>> AnonHugePages: 0 kB
> >>>>>>>> ShmemHugePages: 0 kB
> >>>>>>>> ShmemPmdMapped: 0 kB
> >>>>>>>> CmaTotal: 0 kB
> >>>>>>>> CmaFree: 0 kB
> >>>>>>>> HugePages_Total: 0
> >>>>>>>> HugePages_Free: 0
> >>>>>>>> HugePages_Rsvd: 0
> >>>>>>>> HugePages_Surp: 0
> >>>>>>>> Hugepagesize: 2048 kB
> >>>>>>>> Hugetlb: 0 kB
> >>>>>>>> DirectMap4k: 816396 kB
> >>>>>>>> DirectMap2M: 82513920 kB
> >>>>>>>> DirectMap1G: 52428800 kB
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list