[MonetDB-users] Fwd: Re: DB Corruption
This is what one of my coworkers reports about the problem. both columns are varchar(24). We have confirmed that there are no null values. We have a theory… Approximately every 6 hours, monetdb stops responding to queries and starts thrashing the disk. This behavior has been confirmed without any inserts. Because of this behavior, we routinely stop and start the service. After a restart, the service returns to a normal state. This appears to be the core problem. Our guess is that we restarted the service in the middle of a massive import. We will make sure this doesn't happen again, however, we still cannot find any sort of resolution as to why monetdb routinely starts thrashing the disk (on the order of writing 150MB/s) without any inserts, only select queries. Sorry to change the topic, but we're moving on the assumption we corrupted our db by restarting the service in the middle of a massive "INSERT INTO FROM" query. Can you think of any reason why monetdb would need to write to the disk in such an aggressive manner while ONLY serving select queries? This appears to be the core problem. Thanks On Mar 26, 2012, at 9:23 AM, Joseph Brower wrote:
What would you like me to tell them?
-------- Original Message -------- Subject: Re: [MonetDB-users] DB Corruption Date: Sun, 25 Mar 2012 09:15:24 +0200 From: Martin Kersten
Reply-To: Martin.Kersten@cwi.nl, Communication channel for MonetDB users To: Communication channel for MonetDB users What is the column type? Is the situation repeatable using a fresh database?
regards, Martin
On 3/25/12 6:34 AM, Joseph Brower wrote:
We just checked. There are no nulls in the database at all. Additionally, there is only one column that has a different count.
Thanks,
Joseph Brower
On 03/24/2012 09:58 AM, Stefan Manegold wrote:
Joseph,
since you did not specify your queries in more detail, I assume they are as simple as (1) select count(*) from
; and (2) select count(single_column) from ; In that case, you could try the following sanity check:
(3) select count(*) from
where single_column is null; and check whether that returns the difference between your counts.
If that indeed returns the difference between your global count(*) and count(single_column), this would confirm the existence of NULL values, and you'd need to invest which of the values in your CVS file have been interpreted as NULL and why.
In case the result of queries (2) and (3) do not add up to the number returned by query (1), you might want to check the output(s) of
TRACE select count(*) from
; TRACE select count(single_column) from ; TRACE select count(*) from where single_column is null; (or have us check them, provided/once we find the time).
Stefan
On Sat, Mar 24, 2012 at 09:37:38AM -0600, Joseph Brower wrote:
There shouldn't be any nulls. The records were all added by a "Copy Into" command. The files that we imported had no nulls in it at all.
On 03/24/2012 01:57 AM, Sjoerd Mullender wrote:
On 2012-03-24 00:14, Joseph Brower wrote:
We were troubleshooting some odd results coming back from our database, and found that it's missing some entries from a column. This appears to be causing some sort of offset. Does anyone have any ideas on what can be done to resolve it or what may have caused it? Is there any information we could send that would be helpful in troubleshooting this or for a bug report?
If we do a count(*) and then a count('single_column') we are getting two different results (when they should be indentical) count (*) counts all rows, whereas count(single_column) only count non-null values. Are there any nulls in the single_column?
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Mon, Mar 26, 2012 at 09:48:02AM -0600, Joseph Brower wrote:
This is what one of my coworkers reports about the problem.
both columns are varchar(24).
We have confirmed that there are no null values.
We have a theory…
Approximately every 6 hours, monetdb stops responding to queries and starts thrashing the disk. This behavior has been confirmed without any inserts. Because of this behavior, we routinely stop and start the service. After a restart, the service returns to a normal state. This appears to be the core problem. Our guess is that we restarted the service in the middle of a massive import. We will make sure this doesn't happen again, however, we still cannot find any sort of resolution as to why monetdb routinely starts thrashing the disk (on the order of writing 150MB/s) without any inserts, only select queries. After a period of 5 minutes without updates the monetdb logs are flushed, which may lead to lots of bat (and corresponding files) to get flushed to disk. Could be the cause of your reported problems here.
Niels
Sorry to change the topic, but we're moving on the assumption we corrupted our db by restarting the service in the middle of a massive "INSERT INTO FROM" query.
Can you think of any reason why monetdb would need to write to the disk in such an aggressive manner while ONLY serving select queries? This appears to be the core problem.
Thanks
On Mar 26, 2012, at 9:23 AM, Joseph Brower wrote:
What would you like me to tell them?
-------- Original Message -------- Subject: Re: [MonetDB-users] DB Corruption Date: Sun, 25 Mar 2012 09:15:24 +0200 From: Martin Kersten
Reply-To: Martin.Kersten@cwi.nl, Communication channel for MonetDB users To: Communication channel for MonetDB users What is the column type? Is the situation repeatable using a fresh database?
regards, Martin
On 3/25/12 6:34 AM, Joseph Brower wrote: > We just checked. There are no nulls in the database at all. > Additionally, there is only one column that has a different count. > > Thanks, > > Joseph Brower > > On 03/24/2012 09:58 AM, Stefan Manegold wrote: >> Joseph, >> >> since you did not specify your queries in more detail, I assume they are as >> simple as >> (1) select count(*) from
; >> and >> (2) select count(single_column) from ; >> >> In that case, you could try the following sanity check: >> >> (3) select count(*) from where single_column is null; >> >> and check whether that returns the difference between your counts. >> >> If that indeed returns the difference between your global count(*) and >> count(single_column), this would confirm the existence of NULL values, and >> you'd need to invest which of the values in your CVS file have been >> interpreted as NULL and why. >> >> In case the result of queries (2) and (3) do not add up to the number >> returned by query (1), you might want to check the output(s) of >> >> TRACE select count(*) from ; >> TRACE select count(single_column) from ; >> TRACE select count(*) from where single_column is null; >> >> (or have us check them, provided/once we find the time). >> >> Stefan >> >> On Sat, Mar 24, 2012 at 09:37:38AM -0600, Joseph Brower wrote: >>> There shouldn't be any nulls. The records were all added by a "Copy >>> Into" command. The files that we imported had no nulls in it at >>> all. >>> >>> On 03/24/2012 01:57 AM, Sjoerd Mullender wrote: >>>> On 2012-03-24 00:14, Joseph Brower wrote: >>>>> We were troubleshooting some odd results coming back from our database, >>>>> and found that it's missing some entries from a column. This appears to >>>>> be causing some sort of offset. Does anyone have any ideas on what can >>>>> be done to resolve it or what may have caused it? Is there any >>>>> information we could send that would be helpful in troubleshooting this >>>>> or for a bug report? >>>>> >>>>> If we do a count(*) and then a count('single_column') we are getting two >>>>> different results (when they should be indentical) >>>> count (*) counts all rows, whereas count(single_column) only count >>>> non-null values. Are there any nulls in the single_column? >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> This SF email is sponsosred by: >>>> Try Windows Azure free for 90 days Click Here >>>> http://p.sf.net/sfu/sfd2d-msazure >>>> >>>> >>>> _______________________________________________ >>>> MonetDB-users mailing list >>>> MonetDB-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >>> ------------------------------------------------------------------------------ >>> This SF email is sponsosred by: >>> Try Windows Azure free for 90 days Click Here >>> http://p.sf.net/sfu/sfd2d-msazure >>> _______________________________________________ >>> MonetDB-users mailing list >>> MonetDB-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >> > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > MonetDB-users mailing list > MonetDB-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-users ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
We thought that to, but sometimes it will go on for 30 or 40 minutes, yet disk consumption doesn't really grow at all. I'd think if it was flushing to the disk, it would consume more space. Also, if that was the case, how would we address it? Thanks, Joseph Brower On 03/26/2012 10:35 AM, Niels Nes wrote:
On Mon, Mar 26, 2012 at 09:48:02AM -0600, Joseph Brower wrote:
This is what one of my coworkers reports about the problem.
both columns are varchar(24).
We have confirmed that there are no null values.
We have a theory...
Approximately every 6 hours, monetdb stops responding to queries and starts thrashing the disk. This behavior has been confirmed without any inserts. Because of this behavior, we routinely stop and start the service. After a restart, the service returns to a normal state. This appears to be the core problem. Our guess is that we restarted the service in the middle of a massive import. We will make sure this doesn't happen again, however, we still cannot find any sort of resolution as to why monetdb routinely starts thrashing the disk (on the order of writing 150MB/s) without any inserts, only select queries. After a period of 5 minutes without updates the monetdb logs are flushed, which may lead to lots of bat (and corresponding files) to get flushed to disk. Could be the cause of your reported problems here.
Niels
Sorry to change the topic, but we're moving on the assumption we corrupted our db by restarting the service in the middle of a massive "INSERT INTO FROM" query.
Can you think of any reason why monetdb would need to write to the disk in such an aggressive manner while ONLY serving select queries? This appears to be the core problem.
Thanks
On Mar 26, 2012, at 9:23 AM, Joseph Brower wrote:
What would you like me to tell them?
-------- Original Message -------- Subject: Re: [MonetDB-users] DB Corruption Date: Sun, 25 Mar 2012 09:15:24 +0200 From: Martin Kersten
Reply-To: Martin.Kersten@cwi.nl, Communication channel for MonetDB users To: Communication channel for MonetDB users What is the column type? Is the situation repeatable using a fresh database?
regards, Martin
On 3/25/12 6:34 AM, Joseph Brower wrote: > We just checked. There are no nulls in the database at all. > Additionally, there is only one column that has a different count. > > Thanks, > > Joseph Brower > > On 03/24/2012 09:58 AM, Stefan Manegold wrote: >> Joseph, >> >> since you did not specify your queries in more detail, I assume they are as >> simple as >> (1) select count(*) from
; >> and >> (2) select count(single_column) from ; >> >> In that case, you could try the following sanity check: >> >> (3) select count(*) from where single_column is null; >> >> and check whether that returns the difference between your counts. >> >> If that indeed returns the difference between your global count(*) and >> count(single_column), this would confirm the existence of NULL values, and >> you'd need to invest which of the values in your CVS file have been >> interpreted as NULL and why. >> >> In case the result of queries (2) and (3) do not add up to the number >> returned by query (1), you might want to check the output(s) of >> >> TRACE select count(*) from ; >> TRACE select count(single_column) from ; >> TRACE select count(*) from where single_column is null; >> >> (or have us check them, provided/once we find the time). >> >> Stefan >> >> On Sat, Mar 24, 2012 at 09:37:38AM -0600, Joseph Brower wrote: >>> There shouldn't be any nulls. The records were all added by a "Copy >>> Into" command. The files that we imported had no nulls in it at >>> all. >>> >>> On 03/24/2012 01:57 AM, Sjoerd Mullender wrote: >>>> On 2012-03-24 00:14, Joseph Brower wrote: >>>>> We were troubleshooting some odd results coming back from our database, >>>>> and found that it's missing some entries from a column. This appears to >>>>> be causing some sort of offset. Does anyone have any ideas on what can >>>>> be done to resolve it or what may have caused it? Is there any >>>>> information we could send that would be helpful in troubleshooting this >>>>> or for a bug report? >>>>> >>>>> If we do a count(*) and then a count('single_column') we are getting two >>>>> different results (when they should be indentical) >>>> count (*) counts all rows, whereas count(single_column) only count >>>> non-null values. Are there any nulls in the single_column? >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> This SF email is sponsosred by: >>>> Try Windows Azure free for 90 days Click Here >>>> http://p.sf.net/sfu/sfd2d-msazure >>>> >>>> >>>> _______________________________________________ >>>> MonetDB-users mailing list >>>> MonetDB-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >>> ------------------------------------------------------------------------------ >>> This SF email is sponsosred by: >>> Try Windows Azure free for 90 days Click Here >>> http://p.sf.net/sfu/sfd2d-msazure >>> _______________________________________________ >>> MonetDB-users mailing list >>> MonetDB-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >> > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > MonetDB-users mailing list > MonetDB-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-users ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Mon, Mar 26, 2012 at 02:12:36PM -0600, Joseph Brower wrote:
We thought that to, but sometimes it will go on for 30 or 40 minutes, yet disk consumption doesn't really grow at all. I'd think if it was flushing to the disk, it would consume more space. Also, if that was the case, how would we address it?
Joseph To see if this is the case you can check the log directory (usualy located under the dbfarm (sql_logs). If during trashing files in this directory change, then this is surely the case. If not we should look for another cause. Niels
Thanks,
Joseph Brower
On 03/26/2012 10:35 AM, Niels Nes wrote:
On Mon, Mar 26, 2012 at 09:48:02AM -0600, Joseph Brower wrote:
This is what one of my coworkers reports about the problem.
both columns are varchar(24).
We have confirmed that there are no null values.
We have a theory…
Approximately every 6 hours, monetdb stops responding to queries and starts thrashing the disk. This behavior has been confirmed without any inserts. Because of this behavior, we routinely stop and start the service. After a restart, the service returns to a normal state. This appears to be the core problem. Our guess is that we restarted the service in the middle of a massive import. We will make sure this doesn't happen again, however, we still cannot find any sort of resolution as to why monetdb routinely starts thrashing the disk (on the order of writing 150MB/s) without any inserts, only select queries.
After a period of 5 minutes without updates the monetdb logs are flushed, which may lead to lots of bat (and corresponding files) to get flushed to disk. Could be the cause of your reported problems here.
Niels
Sorry to change the topic, but we're moving on the assumption we corrupted our db by restarting the service in the middle of a massive "INSERT INTO FROM" query.
Can you think of any reason why monetdb would need to write to the disk in such an aggressive manner while ONLY serving select queries? This appears to be the core problem.
Thanks
On Mar 26, 2012, at 9:23 AM, Joseph Brower wrote:
What would you like me to tell them?
-------- Original Message -------- Subject: Re: [MonetDB-users] DB Corruption Date: Sun, 25 Mar 2012 09:15:24 +0200 From: Martin Kersten
Reply-To: Martin.Kersten@cwi.nl, Communication channel for MonetDB users To: Communication channel for MonetDB users What is the column type? Is the situation repeatable using a fresh database?
regards, Martin
On 3/25/12 6:34 AM, Joseph Brower wrote: > We just checked. There are no nulls in the database at all. > Additionally, there is only one column that has a different count. > > Thanks, > > Joseph Brower > > On 03/24/2012 09:58 AM, Stefan Manegold wrote: >> Joseph, >> >> since you did not specify your queries in more detail, I assume they are as >> simple as >> (1) select count(*) from
; >> and >> (2) select count(single_column) from ; >> >> In that case, you could try the following sanity check: >> >> (3) select count(*) from where single_column is null; >> >> and check whether that returns the difference between your counts. >> >> If that indeed returns the difference between your global count(*) and >> count(single_column), this would confirm the existence of NULL values, and >> you'd need to invest which of the values in your CVS file have been >> interpreted as NULL and why. >> >> In case the result of queries (2) and (3) do not add up to the number >> returned by query (1), you might want to check the output(s) of >> >> TRACE select count(*) from ; >> TRACE select count(single_column) from ; >> TRACE select count(*) from where single_column is null; >> >> (or have us check them, provided/once we find the time). >> >> Stefan >> >> On Sat, Mar 24, 2012 at 09:37:38AM -0600, Joseph Brower wrote: >>> There shouldn't be any nulls. The records were all added by a "Copy >>> Into" command. The files that we imported had no nulls in it at >>> all. >>> >>> On 03/24/2012 01:57 AM, Sjoerd Mullender wrote: >>>> On 2012-03-24 00:14, Joseph Brower wrote: >>>>> We were troubleshooting some odd results coming back from our database, >>>>> and found that it's missing some entries from a column. This appears to >>>>> be causing some sort of offset. Does anyone have any ideas on what can >>>>> be done to resolve it or what may have caused it? Is there any >>>>> information we could send that would be helpful in troubleshooting this >>>>> or for a bug report? >>>>> >>>>> If we do a count(*) and then a count('single_column') we are getting two >>>>> different results (when they should be indentical) >>>> count (*) counts all rows, whereas count(single_column) only count >>>> non-null values. Are there any nulls in the single_column? >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> This SF email is sponsosred by: >>>> Try Windows Azure free for 90 days Click Here >>>> http://p.sf.net/sfu/sfd2d-msazure >>>> >>>> >>>> _______________________________________________ >>>> MonetDB-users mailing list >>>> MonetDB-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >>> ------------------------------------------------------------------------------ >>> This SF email is sponsosred by: >>> Try Windows Azure free for 90 days Click Here >>> http://p.sf.net/sfu/sfd2d-msazure >>> _______________________________________________ >>> MonetDB-users mailing list >>> MonetDB-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >> > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > MonetDB-users mailing list > MonetDB-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-users ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (2)
-
Joseph Brower
-
Niels Nes