[MonetDB-users] constant bulk inserts to monetdb
ok,but why it don`t utilize one core to the max,it seem to keep using 8% of the core and takes about 17 seconds to insert 200,000 record of 1Kb each (the actual file is 64MB).
That depends on the total setup. Especially, when you are exceeding memory then Linux is not the best operating system. One of the issues is that dirty pages are not properly flushed. A BATsave in the new bulk loader at critical points helped to avoid this case. if you do a COPY into a clean bat, then the log overhead is neglectable.
can i some how skip the logs and make it insert right into the bat files? also how can i stream the tuples programaticlly from C mapi or ODBC?
See the C Mapi library. Or you might look at the Stethoscope, which streams tuples from server to an application.
i am running MonetDB on windows with a 8GB of ram.
That should be more then enough ; )
what is Stethoscope?
it is part of the source distribution and a Linux utility that picks up a stream of tuples from the server. does it work on windows,and how it works,using a special mil/mal syntax or does it is a plugin in the server for fast loads? also how you guys load data into benchmark databases which are(according to papers i have seen) 1-100GB of data.
The information you provide is hard to track down to a possible cause. a COPY operation most likely reads the complete file into its buffers before processing it.
One pitfall you may have stumbled upon is the following. Did you indicate the number of records that you are about to copy into the table? If not, then the system has to guess and will repeatedly adjust this guess, which involves quite some overhead. Pleae use, COPY 50000 RECORDS INTO ..... even if i want to read all the file?
Do you happen to be a Ruby-on-Rails expert?
i am not a ruby on rails expert,sorry i am python guy :)
thanks again.
Martin Kersten wrote:
uriel katz wrote:
i have a application where i have constant bulk inserts(about 50000 rows of 1Kbyte each) every 1 minute,i am wondering which is the fastest and most efficient way to insert data.
COPY into is the fastest
uriel katz wrote:
ok,but why it don`t utilize one core to the max,it seem to keep using 8% of the core and takes about 17 seconds to insert 200,000 record of 1Kb each (the actual file is 64MB).
That depends on the total setup. Especially, when you are exceeding memory then Linux is not the best operating system. One of the issues is that dirty pages are not properly flushed. A BATsave in the new bulk loader at critical points helped to avoid this case. if you do a COPY into a clean bat, then the log overhead is neglectable.
can i some how skip the logs and make it insert right into the bat files? also how can i stream the tuples programaticlly from C mapi or ODBC?
See the C Mapi library. Or you might look at the Stethoscope, which streams tuples from server to an application.
i am running MonetDB on windows with a 8GB of ram.
That should be more then enough ; )
what is Stethoscope?
it is part of the source distribution and a Linux utility that picks up a stream of tuples from the server.
does it work on windows,and how it works,using a special mil/mal
It is not compiled on Windows. The source will give you an indication how it could work.
syntax or does it is a plugin in the server for fast loads?
also how you guys load data into benchmark databases which are(according to papers i have seen) 1-100GB of data.
The large database we recently loaded is the SkyServer application. Small size is 150GB, which took 1.5 hour. The large one is a 2.6 TB database and no major problems.
The information you provide is hard to track down to a possible cause. a COPY operation most likely reads the complete file into its buffers before processing it.
One pitfall you may have stumbled upon is the following. Did you indicate the number of records that you are about to copy into the table? If not, then the system has to guess and will repeatedly adjust this guess, which involves quite some overhead. Pleae use, COPY 50000 RECORDS INTO .....
even if i want to read all the file?
if you know the number of tuples upfront, please tell it to the COPY command. It allows the system to claim enough space. In general, we do not stimulate low level interactions. Often the cause of the problem is at a different level. E.g. the number of BATs, are there strings involved that screw up a hash, are the IO channels properly used, what is the bandwidth, what are the OS settings, etc....
..
uriel katz wrote:
> i have a application where i have constant bulk inserts(about 50000 > rows of 1Kbyte each) every 1 minute,i am wondering which is the fastest and > most efficient way to insert data. > > > COPY into is the fastest
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
Martin Kersten
-
uriel katz