[MonetDB-users] Very Slow Selects
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly. The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second. Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide? Thanks, Joseph Brower
Is your import actually commited? If you see the sqllog directory
increase a lot over time, then you know its not committed.
Stefan
Op 29 feb 2012 om 01:06 heeft Joseph Brower
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower
--- --- --- --------------------------------------------------------------------- Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
!DSPAM:1,4f4de67f234601993420082!
On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower
How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance. By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;. To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD). So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest. -- Sjoerd Mullender
We are using copy into. It works wonderfully. :-D We still suffer from the slow selects though. On 02/29/2012 02:19 AM, Sjoerd Mullender wrote:
On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance.
By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;.
To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD).
So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest.
On 02/29/2012 10:49 AM, Joseph Brower wrote:
We are using copy into. It works wonderfully. :-D We still suffer from the slow selects though. To answer why requires a lot more knowledge about your application setting. You might start looking at the TRACE sqlstatement for hints on what is going on in your queries. Firing tons of concurrent queries may also lead to resource 'fights'
On 02/29/2012 02:19 AM, Sjoerd Mullender wrote:
On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance.
By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;.
To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD).
So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest.
------------------------------------------------------------------------------ Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
OK. Per someone elses suggestion, we checked the sqllog. It's only 65MB, and is growing very slowly, so I don't think that is a concern. The following is an excerpt from top. As you can tell VIRT is quite high. VIRT 57.9g RES 7.4g SHR 6.0g %CPU 117 %MEM 47.5 Thanks again for taking a look at this. I'm unfamiliar with the TRACE. How exactly do I go about using that? Take care! Joseph Brower On 02/29/2012 04:23 AM, Martin Kersten wrote:
On 02/29/2012 10:49 AM, Joseph Brower wrote:
We are using copy into. It works wonderfully. :-D We still suffer from the slow selects though. To answer why requires a lot more knowledge about your application setting. You might start looking at the TRACE sqlstatement for hints on what is going on in your queries. Firing tons of concurrent queries may also lead to resource 'fights'
On 02/29/2012 02:19 AM, Sjoerd Mullender wrote:
On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance.
By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;.
To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD).
So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest.
------------------------------------------------------------------------------ Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Wed, Feb 29, 2012 at 11:15:21AM -0700, Joseph Brower wrote:
OK. Per someone elses suggestion, we checked the sqllog. It's only 65MB, and is growing very slowly, so I don't think that is a concern.
The following is an excerpt from top. As you can tell VIRT is quite high.
to be expected if you access a large database.
VIRT 57.9g RES 7.4g SHR 6.0g %CPU 117 %MEM 47.5
Thanks again for taking a look at this. I'm unfamiliar with the TRACE. How exactly do I go about using that?
cf., http://www.monetdb.org/Documentation/Manuals/SQLreference/Runtime http://www.monetdb.org/Documentation/Manuals/SQLreference/Trace Stefan
Take care!
Joseph Brower
On 02/29/2012 04:23 AM, Martin Kersten wrote:
On 02/29/2012 10:49 AM, Joseph Brower wrote:
We are using copy into. It works wonderfully. :-D We still suffer from the slow selects though. To answer why requires a lot more knowledge about your application setting. You might start looking at the TRACE sqlstatement for hints on what is going on in your queries. Firing tons of concurrent queries may also lead to resource 'fights'
On 02/29/2012 02:19 AM, Sjoerd Mullender wrote:
On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance.
By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;.
To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD).
So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest.
------------------------------------------------------------------------------ Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
That's what I figured regarding the memory consumption. We're tinkering around with Trace and hopefully that will yield some results. Thanks, Joseph Brower On 02/29/2012 11:22 AM, Stefan Manegold wrote:
On Wed, Feb 29, 2012 at 11:15:21AM -0700, Joseph Brower wrote:
OK. Per someone elses suggestion, we checked the sqllog. It's only 65MB, and is growing very slowly, so I don't think that is a concern.
The following is an excerpt from top. As you can tell VIRT is quite high. to be expected if you access a large database.
VIRT 57.9g RES 7.4g SHR 6.0g %CPU 117 %MEM 47.5
Thanks again for taking a look at this. I'm unfamiliar with the TRACE. How exactly do I go about using that? cf., http://www.monetdb.org/Documentation/Manuals/SQLreference/Runtime http://www.monetdb.org/Documentation/Manuals/SQLreference/Trace
Stefan
Take care!
Joseph Brower
On 02/29/2012 04:23 AM, Martin Kersten wrote:
On 02/29/2012 10:49 AM, Joseph Brower wrote:
We are using copy into. It works wonderfully. :-D We still suffer from the slow selects though. To answer why requires a lot more knowledge about your application setting. You might start looking at the TRACE sqlstatement for hints on what is going on in your queries. Firing tons of concurrent queries may also lead to resource 'fights'
On 02/29/2012 02:19 AM, Sjoerd Mullender wrote:
On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance.
By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;.
To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD).
So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest.
------------------------------------------------------------------------------ Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (6)
-
Joseph Brower
-
Joseph Brower
-
Martin Kersten
-
Sjoerd Mullender
-
Stefan de Konink
-
Stefan Manegold