Hai Ariel, Just a side question, why are you creating msqldump of your data to reload into MonetDB. Why don’t you immediately prepare CSV data and do the bulk loading? Regards, Jennie
On 31 Mar 2020, at 18:50, Ariel Abadi
wrote: Hi!
Firstly, I would like to thank for the prompt answer!!!
I will wait your answer Panos, to see wether is a bug or not.
In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation (https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles". Also, it would be great to use the OFFSET clause disregards of the format of the remainder.
Again, thank you very much to both!!!
Ariel PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly
On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis
wrote: Hi Arjen, please see my more recent email about the subject.
Best regards, Panos.
On 3/31/20 11:13 AM, Arjen P. de Vries wrote:
Hi Panos,
Great assistance here!
One thing: I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder!
Greetings,
Arjen
On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
Hi Panos!
Enclosed you can find the file (truncated) that I used and the statement I run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1]
https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-...
[2]
https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
On 3/26/20 11:51 AM, Ariel Abadi wrote: the
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states
"CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
On 3/30/20 7:42 PM, Ariel Abadi wrote: the the
COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten
wrote:
> Hi > > I am not sure what issue you are now dealing with, but since you are a > company > that seems to use (or considering) MonetDB in your commercial offerings, > I gladly refer you to MonetDBSolutions(*), where we can give you the > commercial > support needed to diagnose your issues. > > regards Martin > > (*) https://www.monetdbsolutions.com/ > > On 24/03/2020 23:55, Ariel Abadi wrote: >> Thanks Martin for your prompt response >> But I did and msqldump from other database, and now the upload is being > done thru the BULK LOAD. >> >> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING > DELIMITERS '\t','\n','"';/ >> >> >> Any other idea ? :'( >> >> >> >> >> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < martin.kersten@cwi.nl > mailto:martin.kersten@cwi.nl> wrote: >> >> Hi, >> >> given the numbers I suspect you are loading them as SQL insert > statements under autocommit mode, >> Please read the documentation on Bulk loading. >> >> regards,Martin >> Sent from my iPad >> >>> On 24 Mar 2020, at 23:12, Ariel Abadi
aabadi@starbi.com>> wrote: >>> >>> >>> Hi! >>> Maybe somebody can help me, please! >>> >>> I'm trying to upload into Monet a file with 14MM records. Is taking > more than *24 minutes.* >>> It is a fresh new database. >>> >>> Enclosed you will find the server configuration. >>> >>> Monetdb Version Nov2019-SP3 >>> >>> >>> *[root@mn tmp]$ lscpu* >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Byte Order: Little Endian >>> CPU(s): 40 >>> On-line CPU(s) list: 0-39 >>> Thread(s) per core: 2 >>> Core(s) per socket: 10 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family: 6 >>> Model: 85 >>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz >>> Stepping: 4 >>> CPU MHz: 800.479 >>> BogoMIPS: 4400.00 >>> Virtualization: VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 1024K >>> L3 cache: 14080K >>> NUMA node0 CPU(s): > 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> NUMA node1 CPU(s): > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts > rep_good nopl xtopology nonstop_tsc >>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est > tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe > popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm > 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow > vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep > bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap > clflushopt clwb intel_pt avx512cd avx512bw >>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc > cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear > flush_l1d >>> >>> >>> *[root@mn tmp]$ cat /proc/meminfo* >>> >>> MemTotal: 83886080 kB >>> MemFree: 10535812 kB >>> MemAvailable: 80159744 kB >>> Buffers: 0 kB >>> Cached: 69623932 kB >>> SwapCached: 0 kB >>> Active: 6625436 kB >>> Inactive: 63737872 kB >>> Active(anon): 324456 kB >>> Inactive(anon): 417184 kB >>> Active(file): 6300980 kB >>> Inactive(file): 63320688 kB >>> Unevictable: 0 kB >>> Mlocked: 5320 kB >>> SwapTotal: 8388604 kB >>> SwapFree: 8381988 kB >>> Dirty: 220 kB >>> Writeback: 0 kB >>> AnonPages: 13626052 kB >>> Mapped: 186368 kB >>> Shmem: 1800 kB >>> KReclaimable: 4134960 kB >>> Slab: 0 kB >>> SReclaimable: 0 kB >>> SUnreclaim: 0 kB >>> KernelStack: 16720 kB >>> PageTables: 56500 kB >>> NFS_Unstable: 0 kB >>> Bounce: 0 kB >>> WritebackTmp: 0 kB >>> CommitLimit: 74123472 kB >>> Committed_AS: 18337812 kB >>> VmallocTotal: 34359738367 kB >>> VmallocUsed: 415836 kB >>> VmallocChunk: 0 kB >>> Percpu: 39552 kB >>> HardwareCorrupted: 0 kB >>> AnonHugePages: 0 kB >>> ShmemHugePages: 0 kB >>> ShmemPmdMapped: 0 kB >>> CmaTotal: 0 kB >>> CmaFree: 0 kB >>> HugePages_Total: 0 >>> HugePages_Free: 0 >>> HugePages_Rsvd: 0 >>> HugePages_Surp: 0 >>> Hugepagesize: 2048 kB >>> Hugetlb: 0 kB >>> DirectMap4k: 816396 kB >>> DirectMap2M: 82513920 kB >>> DirectMap1G: 52428800 kB