[MonetDB-users] Searching for possibilities to reduce the fluctuation in insertion times.
Hi, We use MonetDB v4.26.4 (Nov 2008-sp2) for our application with the XQuery frontend. Often, queries are executed very fast, but sometimes, execution of the same query takes significantly more time. For example: We insert a XML document of 1.4Mb. with a maximum depth of 4. The document is structured as follows: <rootelement> <item> <child1> <sha1>sha1-value</sha1> <md5>md5-value</md5> </child1> </item> ... </rootelement> We have 8k 'item' elements. Often the insertion of this document is performed in ~0.5s, which is fine. But when the system is performing IO (e.g. copying a large file), the insertion can take up to 6 minutes. So I have two questions: 1) What is the cause of this large difference in execution times; 2) Is there a way to configure MonetDB (or the server) in such a way that the difference between the 'fast' and 'slow' is smaller. Regards, John van Schie
Hi,
by insertion of the document, you are referring to shredding (aka.
pf:add-doc) and not inserting (updating) with new elements on the
already shredded document, right?
What kind of I/O is the system doing? You mean an I/O originated from
some other process that just take up all the resources of your system?
(like copying some hundreds of gigs data to the disk?) If that is the
case, then the problem is just from the OS and the scheduling
algorithms that uses for managing I/O requests. On the other hand, if
this I/O is started because of shredding this small XML document, then
that should not happen. Do you shredded as an updatable document? How
full is your disk? It might be because of large fragmentation on the
disk and not enough free space.
lefteris
On Mon, Feb 2, 2009 at 10:15 AM, John van Schie (DT)
Hi,
We use MonetDB v4.26.4 (Nov 2008-sp2) for our application with the XQuery frontend. Often, queries are executed very fast, but sometimes, execution of the same query takes significantly more time.
For example: We insert a XML document of 1.4Mb. with a maximum depth of 4. The document is structured as follows: <rootelement> <item> <child1> <sha1>sha1-value</sha1> <md5>md5-value</md5> </child1> </item> ... </rootelement> We have 8k 'item' elements.
Often the insertion of this document is performed in ~0.5s, which is fine. But when the system is performing IO (e.g. copying a large file), the insertion can take up to 6 minutes.
So I have two questions: 1) What is the cause of this large difference in execution times; 2) Is there a way to configure MonetDB (or the server) in such a way that the difference between the 'fast' and 'slow' is smaller.
Regards,
John van Schie
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi, Your correct: with insertion I mean a pf:add-doc() and not updating an existing document. The document is shredded as read-only and there is ~170GB available on the disk (out of 400GB) The system I/O is caused by copying a 40GB file from USB drive to disk (but dd if=/dev/zero gave the same result) and originated from a separate process. I'll look into the OS I/O scheduling (for Fedora Core 10) and let you know if I find anything. Regards, John Lefteris wrote:
Hi,
by insertion of the document, you are referring to shredding (aka. pf:add-doc) and not inserting (updating) with new elements on the already shredded document, right?
What kind of I/O is the system doing? You mean an I/O originated from some other process that just take up all the resources of your system? (like copying some hundreds of gigs data to the disk?) If that is the case, then the problem is just from the OS and the scheduling algorithms that uses for managing I/O requests. On the other hand, if this I/O is started because of shredding this small XML document, then that should not happen. Do you shredded as an updatable document? How full is your disk? It might be because of large fragmentation on the disk and not enough free space.
lefteris
On Mon, Feb 2, 2009 at 10:15 AM, John van Schie (DT)
wrote: Hi,
We use MonetDB v4.26.4 (Nov 2008-sp2) for our application with the XQuery frontend. Often, queries are executed very fast, but sometimes, execution of the same query takes significantly more time.
For example: We insert a XML document of 1.4Mb. with a maximum depth of 4. The document is structured as follows: <rootelement> <item> <child1> <sha1>sha1-value</sha1> <md5>md5-value</md5> </child1> </item> ... </rootelement> We have 8k 'item' elements.
Often the insertion of this document is performed in ~0.5s, which is fine. But when the system is performing IO (e.g. copying a large file), the insertion can take up to 6 minutes.
So I have two questions: 1) What is the cause of this large difference in execution times; 2) Is there a way to configure MonetDB (or the server) in such a way that the difference between the 'fast' and 'slow' is smaller.
Regards,
John van Schie
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
It is only natural if you overload the system with some other process,
such as copying 40G with dd, other processes will become slower. It
has nothing to do with MonetDB, your favorite web browser will also
take 3 minutes to open (USB interface eats up CPU, and dd eats up the
IO channel). Especially if you run a DBMS on a machine, which is
already resource consuming, the system will slow done alot even with
small unrelated processes. Thats is why a dedicate server is usually
better.
I don't think you will be able to find anything about the IO scheduler
of Fedora. Actually this is part of the Linux Kernel and not
configurable. The only suggestion that I can make (which I don't know
if it works for IO), if you *must* copy 40gigs while you are shredding
documents, is to run the copy command with "nice", that is:
bash$ nice copy /path/a path/b
lefteris
On Mon, Feb 2, 2009 at 10:51 AM, John van Schie (DT)
Hi,
Your correct: with insertion I mean a pf:add-doc() and not updating an existing document. The document is shredded as read-only and there is ~170GB available on the disk (out of 400GB)
The system I/O is caused by copying a 40GB file from USB drive to disk (but dd if=/dev/zero gave the same result) and originated from a separate process.
I'll look into the OS I/O scheduling (for Fedora Core 10) and let you know if I find anything.
Regards,
John
Lefteris wrote:
Hi,
by insertion of the document, you are referring to shredding (aka. pf:add-doc) and not inserting (updating) with new elements on the already shredded document, right?
What kind of I/O is the system doing? You mean an I/O originated from some other process that just take up all the resources of your system? (like copying some hundreds of gigs data to the disk?) If that is the case, then the problem is just from the OS and the scheduling algorithms that uses for managing I/O requests. On the other hand, if this I/O is started because of shredding this small XML document, then that should not happen. Do you shredded as an updatable document? How full is your disk? It might be because of large fragmentation on the disk and not enough free space.
lefteris
On Mon, Feb 2, 2009 at 10:15 AM, John van Schie (DT)
wrote: Hi,
We use MonetDB v4.26.4 (Nov 2008-sp2) for our application with the XQuery frontend. Often, queries are executed very fast, but sometimes, execution of the same query takes significantly more time.
For example: We insert a XML document of 1.4Mb. with a maximum depth of 4. The document is structured as follows: <rootelement> <item> <child1> <sha1>sha1-value</sha1> <md5>md5-value</md5> </child1> </item> ... </rootelement> We have 8k 'item' elements.
Often the insertion of this document is performed in ~0.5s, which is fine. But when the system is performing IO (e.g. copying a large file), the insertion can take up to 6 minutes.
So I have two questions: 1) What is the cause of this large difference in execution times; 2) Is there a way to configure MonetDB (or the server) in such a way that the difference between the 'fast' and 'slow' is smaller.
Regards,
John van Schie
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Lefteris wrote:
It is only natural if you overload the system with some other process, such as copying 40G with dd, other processes will become slower. It has nothing to do with MonetDB, your favorite web browser will also take 3 minutes to open (USB interface eats up CPU, and dd eats up the IO channel). Especially if you run a DBMS on a machine, which is already resource consuming, the system will slow done alot even with small unrelated processes. Thats is why a dedicate server is usually better.
I still find it a bit hard to believe that with the complete fair I/O scheduler of recent kernels, disk IO can slow a a query down from 0.5s to > 6 minutes.
I don't think you will be able to find anything about the IO scheduler of Fedora. Actually this is part of the Linux Kernel and not configurable. The only suggestion that I can make (which I don't know if it works for IO), if you *must* copy 40gigs while you are shredding documents, is to run the copy command with "nice", that is: bash$ nice copy /path/a path/b
lefteris
ionice works approximately the same for IO. This seems to solve the problem. -- John
On Mon, Feb 2, 2009 at 12:11 PM, John van Schie (DT)
Lefteris wrote:
It is only natural if you overload the system with some other process, such as copying 40G with dd, other processes will become slower. It has nothing to do with MonetDB, your favorite web browser will also take 3 minutes to open (USB interface eats up CPU, and dd eats up the IO channel). Especially if you run a DBMS on a machine, which is already resource consuming, the system will slow done alot even with small unrelated processes. Thats is why a dedicate server is usually better.
I still find it a bit hard to believe that with the complete fair I/O scheduler of recent kernels, disk IO can slow a a query down from 0.5s to > 6 minutes.
You just saw that it can:) Does the 6 minute delay ever appears when you are not copying a 40G file? And you said that ionice works, so there it is:) I/O scheduler on the kernel does not know anything about the process requesting data, its algorithm will benefit the writes vs. reads. See the following link for an overview: http://www.linuxjournal.com/article/6931 I quote from that link: <================ It gets worse for our friend the read request, however. Because writes are asynchronous, writes tend to stream. That is, it is common for a large writeback of a lot of data to occur. This implies that many individual write requests are submitted to a close area of the hard disk. As an example, consider saving a large file. The application dumps write requests on the system and hard drive as fast as it is scheduled. Read requests, conversely, usually do not stream. Instead, applications submit read requests in small one-by-one chunks, with each chunk dependent on the last. Consider reading all of the files in a directory. The application opens the first file, issues a read request for a suitable chunk of the file, waits for the returned data, issues a read request for the next chunk, waits and continues likewise until the entire file is read. Then the file is closed, the next file is opened and the process repeats. Each subsequent request has to wait for the previous, which means substantial delays to this application if the requests are to far-off disk blocks. The phenomenon of streaming write requests starving dependent read requests is called writes-starving-reads ===============> So that is what probably you experience, "writes-starving-reads" lefteris
I don't think you will be able to find anything about the IO scheduler of Fedora. Actually this is part of the Linux Kernel and not configurable. The only suggestion that I can make (which I don't know if it works for IO), if you *must* copy 40gigs while you are shredding documents, is to run the copy command with "nice", that is: bash$ nice copy /path/a path/b
lefteris
ionice works approximately the same for IO. This seems to solve the problem.
-- John
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi John, to get a better impression what the actual problem might be and what triggers it, it would be great, if you could provide us with the necessary data, query|ies and detailed instructions how to reproduce the problem. In addition, I have the following questions: On Mon, Feb 02, 2009 at 10:15:06AM +0100, John van Schie (DT) wrote:
Hi,
We use MonetDB v4.26.4 (Nov 2008-sp2) for our application with the XQuery frontend. Often, queries are executed very fast, but sometimes, execution of the same query takes significantly more time.
For example: We insert a XML document of 1.4Mb. with a maximum depth of 4. The document is structured as follows: <rootelement> <item> <child1> <sha1>sha1-value</sha1> <md5>md5-value</md5> </child1> </item> ... </rootelement> We have 8k 'item' elements.
Often the insertion of this document is performed in ~0.5s, which is fine.
How exactly do you load/insert the document? Using pf:add-doc? Do you add it into a collection of its own or multiple documents into the same collection? Does it occur (also) when inserting this document into an empty database or a new/empty collection or (only) when adding the document to a database and/or collection that already contains some data/documents? If the latter, how many documents/how much data has been successfully (and "fast") inserted into the whole database and into the same collection prior the insert that is unexpectedly slow? How much memory does your machine have? How large is your Mserver when the slow insert occurs? Are using running on a 64-bit or 32-bit systems (both OS & MonetDB)? Which OS are you running on?
But when the system is performing IO (e.g. copying a large file), the insertion can take up to 6 minutes.
"Who" is causing/doing the IO? MonetDB or some other process/application? If MonetDB, is it only running your insert query at that time or running any other queries concurrently? Did you also observer slow inserts without the system being busy with IO?
So I have two questions: 1) What is the cause of this large difference in execution times;
I cannot yet tell without monitoring the system while this happens.
2) Is there a way to configure MonetDB (or the server) in such a way that the difference between the 'fast' and 'slow' is smaller.
In principle no. However, once we understand what is actually happen, we might know more ... Regards, Stefan
Regards,
John van Schie
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Hi Stefan, I'm sorry that I forgot to mention the required details. *Platform* - Fedora Core 10, x86_64 - MonetDB 4.26.4 64 bit November 2008-SP2 RPM (64 bit OIDs) - 8 GB RAM - ~170 GB free. *Query* pf:add-doc('/tmp/<unique name>.xml', '<unique name>', '<unique name>') where unique name is a generated name. There is no document or collection with the given name already in the database when the query is executed. *I/O* Performed by a separate process, copying 40 GB of data. *Other details* The behavior was observed on a non-empty database. I'll try to reproduce it on an empty database so that I can provide you the requested details. But, frankly, I think I'll have to look at the OS I/O scheduler to fix this. Regards, John Stefan Manegold wrote:
Hi John,
to get a better impression what the actual problem might be and what triggers it, it would be great, if you could provide us with the necessary data, query|ies and detailed instructions how to reproduce the problem.
In addition, I have the following questions:
On Mon, Feb 02, 2009 at 10:15:06AM +0100, John van Schie (DT) wrote:
Hi,
We use MonetDB v4.26.4 (Nov 2008-sp2) for our application with the XQuery frontend. Often, queries are executed very fast, but sometimes, execution of the same query takes significantly more time.
For example: We insert a XML document of 1.4Mb. with a maximum depth of 4. The document is structured as follows: <rootelement> <item> <child1> <sha1>sha1-value</sha1> <md5>md5-value</md5> </child1> </item> ... </rootelement> We have 8k 'item' elements.
Often the insertion of this document is performed in ~0.5s, which is fine.
How exactly do you load/insert the document? Using pf:add-doc? Do you add it into a collection of its own or multiple documents into the same collection? Does it occur (also) when inserting this document into an empty database or a new/empty collection or (only) when adding the document to a database and/or collection that already contains some data/documents? If the latter, how many documents/how much data has been successfully (and "fast") inserted into the whole database and into the same collection prior the insert that is unexpectedly slow?
How much memory does your machine have? How large is your Mserver when the slow insert occurs? Are using running on a 64-bit or 32-bit systems (both OS & MonetDB)? Which OS are you running on?
But when the system is performing IO (e.g. copying a large file), the insertion can take up to 6 minutes.
"Who" is causing/doing the IO? MonetDB or some other process/application? If MonetDB, is it only running your insert query at that time or running any other queries concurrently? Did you also observer slow inserts without the system being busy with IO?
So I have two questions: 1) What is the cause of this large difference in execution times;
I cannot yet tell without monitoring the system while this happens.
2) Is there a way to configure MonetDB (or the server) in such a way that the difference between the 'fast' and 'slow' is smaller.
In principle no. However, once we understand what is actually happen, we might know more ...
Regards, Stefan
Regards,
John van Schie
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (3)
-
John van Schie (DT)
-
Lefteris
-
Stefan Manegold