Re: monet crashes with query to VERY large table
Hello, Just a suggestion, if you think the issue is related to the size of the data, then you could use a random data generator to test it. We were facing a similar issue with shareable data for testing. So I wrote a program called tqgen in Golang (https://github.com/niocs/tqgen). It can be used to generate large amounts of csv data for testing, and to share the data, we just need to share the command we used to generate it, and the other person can run it to generate the exact same data. It produces datewise fictional trade/quote data for fictional stocks. It puts each date's data into a separate file, and each file is around 128MB, so if we run it for a daterange of 1.5 years, it should generate around 70GB of data. It takes a while to finish because the data is not completely random and it is related to other rows. To generate, you could run: $ git clone https://github.com/niocs/tqgen $ cd tqgen $ go build tqgen.go $ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20210831 I usually load them one by one so that I could record load time stats, etc, but you could use this bash script ( https://github.com/niocs/hcsv/blob/master/hcat) to concatenate them without repeating the headers, and then load to MonetDB. (I will later add a parameter to tqgen to save to a single file) To do that, $ git clone https://github.com/niocs/hcsv $ hcsv/hcat /path/to/data/tq.20*.csv > /path/to/data/tq.big.csv If this causes the same issue, then others could regenerate the same data to try. Hope this helps. Thanks, Sahas
On Jul 14, 2016 10:20 AM, "Anthony Damico"
wrote: can you winnow down the problem enough to construct a reproducible example? thanks
On Thursday, July 14, 2016, Stefano Fioravanzo
wrote: Hello,
No the data is not public, and testing in on another machine is not an easy option right now.
Regards,
Stefano
On 14 Jul 2016, at 15:49, Anthony Damico
wrote: hi, any chance these data are public? if not, can you test on a separate box and see if the exact same thing happens or if it's hardware-specific?
On Thursday, July 14, 2016, Stefano Fioravanzo
wrote: Hello,
I currently have a table with 307524407 rows and 52 columns, it was loaded from a 78GB cvs file, but I do not know how much space the table actually takes on disk. If I query the table with a simple query like:
select count(*) from big_table where field =1;
mserver5 crashes with ‘segmentation fault' I tried also the following query (adding the LIMIT statement): select count(*) from farmap_movimento where farmacia_id =1 limit 10;
and the output from mserver5 was: mserver5: gdk_select.c:869: fullscan_int: Assertion `cnt < (bn)->S->capacity' failed. Aborted
Doing a more complex query (the one I am aiming for), with also a group by and a few aggregation functions (sum, min) on the fields, the outputs is: mserver5: gdk_bat.c:2066: BATsetcount: Assertion `b->S->capacity >= cnt' failed. Aborted
This seems to be a table size problem, so…is there anything I can do?
I am using MonetDB 5 server v11.22.0 The machine has 30GB of RAM and 60GB of swap. Available disk space: ~50GB
Any help would be much appreciated,
Regards,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Actually, I just remembered tqgen has a couple of other package
dependencies from github, so the best way to download and build it is to
use this command:
$ go get github.com/niocs/tqgen
$ cd $GOPATH/bin
$ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000
--Seed 234532 --DateBeg 20200101 --DateEnd 20210831
Thanks,
Sahas
On Thu, Jul 14, 2016 at 9:18 PM, Sahasranaman M S
Hello,
Just a suggestion, if you think the issue is related to the size of the data, then you could use a random data generator to test it. We were facing a similar issue with shareable data for testing. So I wrote a program called tqgen in Golang (https://github.com/niocs/tqgen).
It can be used to generate large amounts of csv data for testing, and to share the data, we just need to share the command we used to generate it, and the other person can run it to generate the exact same data.
It produces datewise fictional trade/quote data for fictional stocks. It puts each date's data into a separate file, and each file is around 128MB, so if we run it for a daterange of 1.5 years, it should generate around 70GB of data. It takes a while to finish because the data is not completely random and it is related to other rows.
To generate, you could run: $ git clone https://github.com/niocs/tqgen $ cd tqgen $ go build tqgen.go $ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20210831
I usually load them one by one so that I could record load time stats, etc, but you could use this bash script ( https://github.com/niocs/hcsv/blob/master/hcat) to concatenate them without repeating the headers, and then load to MonetDB. (I will later add a parameter to tqgen to save to a single file)
To do that, $ git clone https://github.com/niocs/hcsv $ hcsv/hcat /path/to/data/tq.20*.csv > /path/to/data/tq.big.csv
If this causes the same issue, then others could regenerate the same data to try. Hope this helps.
Thanks, Sahas
On Jul 14, 2016 10:20 AM, "Anthony Damico"
wrote: can you winnow down the problem enough to construct a reproducible example? thanks
On Thursday, July 14, 2016, Stefano Fioravanzo
wrote: Hello,
No the data is not public, and testing in on another machine is not an easy option right now.
Regards,
Stefano
On 14 Jul 2016, at 15:49, Anthony Damico
wrote: hi, any chance these data are public? if not, can you test on a separate box and see if the exact same thing happens or if it's hardware-specific?
On Thursday, July 14, 2016, Stefano Fioravanzo
wrote: Hello,
I currently have a table with 307524407 rows and 52 columns, it was loaded from a 78GB cvs file, but I do not know how much space the table actually takes on disk. If I query the table with a simple query like:
select count(*) from big_table where field =1;
mserver5 crashes with ‘segmentation fault' I tried also the following query (adding the LIMIT statement): select count(*) from farmap_movimento where farmacia_id =1 limit 10;
and the output from mserver5 was: mserver5: gdk_select.c:869: fullscan_int: Assertion `cnt < (bn)->S->capacity' failed. Aborted
Doing a more complex query (the one I am aiming for), with also a group by and a few aggregation functions (sum, min) on the fields, the outputs is: mserver5: gdk_bat.c:2066: BATsetcount: Assertion `b->S->capacity >= cnt' failed. Aborted
This seems to be a table size problem, so…is there anything I can do?
I am using MonetDB 5 server v11.22.0 The machine has 30GB of RAM and 60GB of swap. Available disk space: ~50GB
Any help would be much appreciated,
Regards,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi, we successfully run 100GB TPC-H on a 16GB machine; so I would not bet on the data volume being the only/main reason for the experienced problems. You are apparently using the non-released version of MonetDB (some arbitrary status of the development branch somewhere between the Jul2015 and Jun20216 release ?); for that, there are even fewer "guarantees" than for a released version. Did you possibly also modify the code yourself? I'd strongly suggest to use an official release, preferably the latest Jun2016 release (11.23.3). Note that you might need to (dump and) reload your database; an upgrade from (or to) a non-released version is not guaranteed to work correctly --- only database upgrades between consecutive releases are supported, but also then a backup before upgrading is strongly advised. Best, Stefan ----- On Jul 14, 2016, at 5:48 PM, Sahasranaman M S sahasr@naman.ms wrote:
Hello,
Just a suggestion, if you think the issue is related to the size of the data, then you could use a random data generator to test it. We were facing a similar issue with shareable data for testing. So I wrote a program called tqgen in Golang ( https://github.com/niocs/tqgen ).
It can be used to generate large amounts of csv data for testing, and to share the data, we just need to share the command we used to generate it, and the other person can run it to generate the exact same data.
It produces datewise fictional trade/quote data for fictional stocks. It puts each date's data into a separate file, and each file is around 128MB, so if we run it for a daterange of 1.5 years, it should generate around 70GB of data. It takes a while to finish because the data is not completely random and it is related to other rows.
To generate, you could run: $ git clone https://github.com/niocs/tqgen $ cd tqgen $ go build tqgen.go $ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20210831
I usually load them one by one so that I could record load time stats, etc, but you could use this bash script ( https://github.com/niocs/hcsv/blob/master/hcat ) to concatenate them without repeating the headers, and then load to MonetDB. (I will later add a parameter to tqgen to save to a single file)
To do that, $ git clone https://github.com/niocs/hcsv $ hcsv/hcat /path/to/data/tq.20*.csv > /path/to/data/tq.big.csv
If this causes the same issue, then others could regenerate the same data to try. Hope this helps.
Thanks, Sahas
On Jul 14, 2016 10:20 AM, "Anthony Damico" < ajdamico@gmail.com > wrote:
can you winnow down the problem enough to construct a reproducible example? thanks
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
No the data is not public, and testing in on another machine is not an easy option right now.
Regards,
Stefano
On 14 Jul 2016, at 15:49, Anthony Damico < ajdamico@gmail.com > wrote:
hi, any chance these data are public? if not, can you test on a separate box and see if the exact same thing happens or if it's hardware-specific?
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
I currently have a table with 307524407 rows and 52 columns, it was loaded from a 78GB cvs file, but I do not know how much space the table actually takes on disk. If I query the table with a simple query like:
select count(*) from big_table where field =1;
mserver5 crashes with ‘segmentation fault' I tried also the following query (adding the LIMIT statement): select count(*) from farmap_movimento where farmacia_id =1 limit 10;
and the output from mserver5 was: mserver5: gdk_select.c:869: fullscan_int: Assertion `cnt < (bn)->S->capacity' failed. Aborted
Doing a more complex query (the one I am aiming for), with also a group by and a few aggregation functions (sum, min) on the fields, the outputs is: mserver5: gdk_bat.c:2066: BATsetcount: Assertion `b->S->capacity >= cnt' failed. Aborted
This seems to be a table size problem, so…is there anything I can do?
I am using MonetDB 5 server v11.22.0 The machine has 30GB of RAM and 60GB of swap. Available disk space: ~50GB
Any help would be much appreciated,
Regards,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
... also, the assertions suggest that either the code-base is erroneous or the database is corrupted ... ... either might be related to using an arbitrary non-released version of the "default" (development) branch ... ----- On Jul 14, 2016, at 6:06 PM, Stefan Manegold Stefan.Manegold@cwi.nl wrote:
Hi,
we successfully run 100GB TPC-H on a 16GB machine; so I would not bet on the data volume being the only/main reason for the experienced problems.
You are apparently using the non-released version of MonetDB (some arbitrary status of the development branch somewhere between the Jul2015 and Jun20216 release ?); for that, there are even fewer "guarantees" than for a released version. Did you possibly also modify the code yourself?
I'd strongly suggest to use an official release, preferably the latest Jun2016 release (11.23.3).
Note that you might need to (dump and) reload your database; an upgrade from (or to) a non-released version is not guaranteed to work correctly --- only database upgrades between consecutive releases are supported, but also then a backup before upgrading is strongly advised.
Best, Stefan
----- On Jul 14, 2016, at 5:48 PM, Sahasranaman M S sahasr@naman.ms wrote:
Hello,
Just a suggestion, if you think the issue is related to the size of the data, then you could use a random data generator to test it. We were facing a similar issue with shareable data for testing. So I wrote a program called tqgen in Golang ( https://github.com/niocs/tqgen ).
It can be used to generate large amounts of csv data for testing, and to share the data, we just need to share the command we used to generate it, and the other person can run it to generate the exact same data.
It produces datewise fictional trade/quote data for fictional stocks. It puts each date's data into a separate file, and each file is around 128MB, so if we run it for a daterange of 1.5 years, it should generate around 70GB of data. It takes a while to finish because the data is not completely random and it is related to other rows.
To generate, you could run: $ git clone https://github.com/niocs/tqgen $ cd tqgen $ go build tqgen.go $ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20210831
I usually load them one by one so that I could record load time stats, etc, but you could use this bash script ( https://github.com/niocs/hcsv/blob/master/hcat ) to concatenate them without repeating the headers, and then load to MonetDB. (I will later add a parameter to tqgen to save to a single file)
To do that, $ git clone https://github.com/niocs/hcsv $ hcsv/hcat /path/to/data/tq.20*.csv > /path/to/data/tq.big.csv
If this causes the same issue, then others could regenerate the same data to try. Hope this helps.
Thanks, Sahas
On Jul 14, 2016 10:20 AM, "Anthony Damico" < ajdamico@gmail.com > wrote:
can you winnow down the problem enough to construct a reproducible example? thanks
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
No the data is not public, and testing in on another machine is not an easy option right now.
Regards,
Stefano
On 14 Jul 2016, at 15:49, Anthony Damico < ajdamico@gmail.com > wrote:
hi, any chance these data are public? if not, can you test on a separate box and see if the exact same thing happens or if it's hardware-specific?
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
I currently have a table with 307524407 rows and 52 columns, it was loaded from a 78GB cvs file, but I do not know how much space the table actually takes on disk. If I query the table with a simple query like:
select count(*) from big_table where field =1;
mserver5 crashes with ‘segmentation fault' I tried also the following query (adding the LIMIT statement): select count(*) from farmap_movimento where farmacia_id =1 limit 10;
and the output from mserver5 was: mserver5: gdk_select.c:869: fullscan_int: Assertion `cnt < (bn)->S->capacity' failed. Aborted
Doing a more complex query (the one I am aiming for), with also a group by and a few aggregation functions (sum, min) on the fields, the outputs is: mserver5: gdk_bat.c:2066: BATsetcount: Assertion `b->S->capacity >= cnt' failed. Aborted
This seems to be a table size problem, so…is there anything I can do?
I am using MonetDB 5 server v11.22.0 The machine has 30GB of RAM and 60GB of swap. Available disk space: ~50GB
Any help would be much appreciated,
Regards,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
finally, when reporting problems with non-released (and hence non-supported) versions of MonetDB, you might want to share the exact Mercurial (HG) changeset ID of the code version you're using, and whether or not you changed the code base (and if so, how). Best, Stefan ----- On Jul 14, 2016, at 6:11 PM, Stefan Manegold Stefan.Manegold@cwi.nl wrote:
... also, the assertions suggest that either the code-base is erroneous or the database is corrupted ...
... either might be related to using an arbitrary non-released version of the "default" (development) branch ...
----- On Jul 14, 2016, at 6:06 PM, Stefan Manegold Stefan.Manegold@cwi.nl wrote:
Hi,
we successfully run 100GB TPC-H on a 16GB machine; so I would not bet on the data volume being the only/main reason for the experienced problems.
You are apparently using the non-released version of MonetDB (some arbitrary status of the development branch somewhere between the Jul2015 and Jun20216 release ?); for that, there are even fewer "guarantees" than for a released version. Did you possibly also modify the code yourself?
I'd strongly suggest to use an official release, preferably the latest Jun2016 release (11.23.3).
Note that you might need to (dump and) reload your database; an upgrade from (or to) a non-released version is not guaranteed to work correctly --- only database upgrades between consecutive releases are supported, but also then a backup before upgrading is strongly advised.
Best, Stefan
----- On Jul 14, 2016, at 5:48 PM, Sahasranaman M S sahasr@naman.ms wrote:
Hello,
Just a suggestion, if you think the issue is related to the size of the data, then you could use a random data generator to test it. We were facing a similar issue with shareable data for testing. So I wrote a program called tqgen in Golang ( https://github.com/niocs/tqgen ).
It can be used to generate large amounts of csv data for testing, and to share the data, we just need to share the command we used to generate it, and the other person can run it to generate the exact same data.
It produces datewise fictional trade/quote data for fictional stocks. It puts each date's data into a separate file, and each file is around 128MB, so if we run it for a daterange of 1.5 years, it should generate around 70GB of data. It takes a while to finish because the data is not completely random and it is related to other rows.
To generate, you could run: $ git clone https://github.com/niocs/tqgen $ cd tqgen $ go build tqgen.go $ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20210831
I usually load them one by one so that I could record load time stats, etc, but you could use this bash script ( https://github.com/niocs/hcsv/blob/master/hcat ) to concatenate them without repeating the headers, and then load to MonetDB. (I will later add a parameter to tqgen to save to a single file)
To do that, $ git clone https://github.com/niocs/hcsv $ hcsv/hcat /path/to/data/tq.20*.csv > /path/to/data/tq.big.csv
If this causes the same issue, then others could regenerate the same data to try. Hope this helps.
Thanks, Sahas
On Jul 14, 2016 10:20 AM, "Anthony Damico" < ajdamico@gmail.com > wrote:
can you winnow down the problem enough to construct a reproducible example? thanks
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
No the data is not public, and testing in on another machine is not an easy option right now.
Regards,
Stefano
On 14 Jul 2016, at 15:49, Anthony Damico < ajdamico@gmail.com > wrote:
hi, any chance these data are public? if not, can you test on a separate box and see if the exact same thing happens or if it's hardware-specific?
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
I currently have a table with 307524407 rows and 52 columns, it was loaded from a 78GB cvs file, but I do not know how much space the table actually takes on disk. If I query the table with a simple query like:
select count(*) from big_table where field =1;
mserver5 crashes with ‘segmentation fault' I tried also the following query (adding the LIMIT statement): select count(*) from farmap_movimento where farmacia_id =1 limit 10;
and the output from mserver5 was: mserver5: gdk_select.c:869: fullscan_int: Assertion `cnt < (bn)->S->capacity' failed. Aborted
Doing a more complex query (the one I am aiming for), with also a group by and a few aggregation functions (sum, min) on the fields, the outputs is: mserver5: gdk_bat.c:2066: BATsetcount: Assertion `b->S->capacity >= cnt' failed. Aborted
This seems to be a table size problem, so…is there anything I can do?
I am using MonetDB 5 server v11.22.0 The machine has 30GB of RAM and 60GB of swap. Available disk space: ~50GB
Any help would be much appreciated,
Regards,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Thank you all for the support, No we did not change the code or anything, I think we will install the latest stable release before trying anything else. If the issue persists I will let you know. Thanks again, Regards, Stefano
On 14 Jul 2016, at 18:15, Stefan Manegold
wrote: finally, when reporting problems with non-released (and hence non-supported) versions of MonetDB, you might want to share the exact Mercurial (HG) changeset ID of the code version you're using, and whether or not you changed the code base (and if so, how).
Best, Stefan
----- On Jul 14, 2016, at 6:11 PM, Stefan Manegold Stefan.Manegold@cwi.nl wrote:
... also, the assertions suggest that either the code-base is erroneous or the database is corrupted ...
... either might be related to using an arbitrary non-released version of the "default" (development) branch ...
----- On Jul 14, 2016, at 6:06 PM, Stefan Manegold Stefan.Manegold@cwi.nl wrote:
Hi,
we successfully run 100GB TPC-H on a 16GB machine; so I would not bet on the data volume being the only/main reason for the experienced problems.
You are apparently using the non-released version of MonetDB (some arbitrary status of the development branch somewhere between the Jul2015 and Jun20216 release ?); for that, there are even fewer "guarantees" than for a released version. Did you possibly also modify the code yourself?
I'd strongly suggest to use an official release, preferably the latest Jun2016 release (11.23.3).
Note that you might need to (dump and) reload your database; an upgrade from (or to) a non-released version is not guaranteed to work correctly --- only database upgrades between consecutive releases are supported, but also then a backup before upgrading is strongly advised.
Best, Stefan
----- On Jul 14, 2016, at 5:48 PM, Sahasranaman M S sahasr@naman.ms wrote:
Hello,
Just a suggestion, if you think the issue is related to the size of the data, then you could use a random data generator to test it. We were facing a similar issue with shareable data for testing. So I wrote a program called tqgen in Golang ( https://github.com/niocs/tqgen ).
It can be used to generate large amounts of csv data for testing, and to share the data, we just need to share the command we used to generate it, and the other person can run it to generate the exact same data.
It produces datewise fictional trade/quote data for fictional stocks. It puts each date's data into a separate file, and each file is around 128MB, so if we run it for a daterange of 1.5 years, it should generate around 70GB of data. It takes a while to finish because the data is not completely random and it is related to other rows.
To generate, you could run: $ git clone https://github.com/niocs/tqgen $ cd tqgen $ go build tqgen.go $ ./tqgen --OutFilePat /path/to/data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20210831
I usually load them one by one so that I could record load time stats, etc, but you could use this bash script ( https://github.com/niocs/hcsv/blob/master/hcat ) to concatenate them without repeating the headers, and then load to MonetDB. (I will later add a parameter to tqgen to save to a single file)
To do that, $ git clone https://github.com/niocs/hcsv $ hcsv/hcat /path/to/data/tq.20*.csv > /path/to/data/tq.big.csv
If this causes the same issue, then others could regenerate the same data to try. Hope this helps.
Thanks, Sahas
On Jul 14, 2016 10:20 AM, "Anthony Damico" < ajdamico@gmail.com > wrote:
can you winnow down the problem enough to construct a reproducible example? thanks
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
No the data is not public, and testing in on another machine is not an easy option right now.
Regards,
Stefano
On 14 Jul 2016, at 15:49, Anthony Damico < ajdamico@gmail.com > wrote:
hi, any chance these data are public? if not, can you test on a separate box and see if the exact same thing happens or if it's hardware-specific?
On Thursday, July 14, 2016, Stefano Fioravanzo < fioravanzos@gmail.com > wrote:
Hello,
I currently have a table with 307524407 rows and 52 columns, it was loaded from a 78GB cvs file, but I do not know how much space the table actually takes on disk. If I query the table with a simple query like:
select count(*) from big_table where field =1;
mserver5 crashes with ‘segmentation fault' I tried also the following query (adding the LIMIT statement): select count(*) from farmap_movimento where farmacia_id =1 limit 10;
and the output from mserver5 was: mserver5: gdk_select.c:869: fullscan_int: Assertion `cnt < (bn)->S->capacity' failed. Aborted
Doing a more complex query (the one I am aiming for), with also a group by and a few aggregation functions (sum, min) on the fields, the outputs is: mserver5: gdk_bat.c:2066: BATsetcount: Assertion `b->S->capacity >= cnt' failed. Aborted
This seems to be a table size problem, so…is there anything I can do?
I am using MonetDB 5 server v11.22.0 The machine has 30GB of RAM and 60GB of swap. Available disk space: ~50GB
Any help would be much appreciated,
Regards,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Sahasranaman M S
-
Stefan Manegold
-
Stefano Fioravanzo