Anthony,
given that it's hard for us to analyse / debug without being able to reproduce,
did you / could you simplify / "strip-down" you query to see when it fails?
E.g., you could try the following series of reduced versions and alternatives
of your query, and report which work and which don't?
Thanks!
Best,
Stefan
========
-- just the core join
SELECT
count(*)
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
;
-- core join and grouping
SELECT
count(*)
FROM (
SELECT
a.enrolid ,
a.yr ,
a.datatyp ,
a.svcmon AS ccaed123_b_svcmon ,
count(*)
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
GROUP BY
a.enrolid ,
a.yr ,
a.datatyp ,
ccaed123_b_svcmon
) AS t;
-- create same table, but only with simple count aggregation
CREATE TABLE ccaed123_b_svcmon_xwalk AS (
SELECT
a.enrolid ,
a.yr ,
a.datatyp ,
a.svcmon AS ccaed123_b_svcmon ,
count(*)
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
GROUP BY
a.enrolid ,
a.yr ,
a.datatyp ,
ccaed123_b_svcmon
) WITH DATA;
-- avoid CASE statement; rather add condition to join predicate
CREATE TABLE ccaed123_b_svcmon_xwalk AS (
SELECT
a.enrolid ,
a.yr ,
a.datatyp ,
a.svcmon AS ccaed123_b_svcmon ,
( MIN( CAST( ( b.svcmon - a.svcmon ) AS INTEGER ) ) ) AS svcmon_diff
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
AND
a.svcmon <= b.svcmon
GROUP BY
a.enrolid ,
a.yr ,
a.datatyp ,
ccaed123_b_svcmon
) WITH DATA;
-- likewise, but with condition as extra filter rather than in join predicate
CREATE TABLE ccaed123_b_svcmon_xwalk AS (
SELECT
a.enrolid ,
a.yr ,
a.datatyp ,
a.svcmon AS ccaed123_b_svcmon ,
( MIN( CAST( ( b.svcmon - a.svcmon ) AS INTEGER ) ) ) AS svcmon_diff
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
WHERE
a.svcmon <= b.svcmon
GROUP BY
a.enrolid ,
a.yr ,
a.datatyp ,
ccaed123_b_svcmon
) WITH DATA;
-- almost the original query, but avoid ABS()
CREATE TABLE ccaed123_b_svcmon_xwalk AS (
SELECT
a.enrolid ,
a.yr ,
a.datatyp ,
a.svcmon AS ccaed123_b_svcmon ,
( MIN( CAST( ( CASE
WHEN a.svcmon <= b.svcmon
THEN b.svcmon - a.svcmon
ELSE NULL
END ) AS INTEGER ) ) ) AS svcmon_diff
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
GROUP BY
a.enrolid ,
a.yr ,
a.datatyp ,
ccaed123_b_svcmon
) WITH DATA;
-- original query
) WITH DATA;CREATE TABLE ccaed123_b_svcmon_xwalk AS (
SELECT
a.enrolid ,
a.yr ,
a.datatyp ,
a.svcmon AS ccaed123_b_svcmon ,
( MIN( CAST( ( CASE
WHEN a.svcmon <= b.svcmon
THEN ABS( a.svcmon - b.svcmon )
ELSE NULL
END ) AS INTEGER ) ) ) AS svcmon_diff
FROM
ccaed123_b_monthly AS a
INNER JOIN
ccaet123_b_st_wt AS b
ON
a.enrolid = b.enrolid
AND
a.yr = b.yr
AND
a.datatyp = b.datatyp
GROUP BY
a.enrolid ,
a.yr ,
a.datatyp ,
ccaed123_b_svcmon
========
--
----- On Nov 23, 2017, at 1:10 AM, Anthony Damico ajdamico@gmail.com wrote:
> hi, this query fails after i run nearly-identical queries on similar tables, so
> looking at #3791, something is lopsided in this particular table but not the
> others, i guess. here's the query:
>
>
> CREATE TABLE ccaed123_b_svcmon_xwalk AS
> ( SELECT
> a.enrolid ,
> a.yr ,
> a.datatyp ,
> a.svcmon AS ccaed123_b_svcmon ,
> ( MIN( CAST( ( CASE WHEN a.svcmon <= b.svcmon THEN ABS( a.svcmon - b.svcmon )
> ELSE NULL END ) AS INTEGER ) ) ) AS svcmon_diff
> FROM ccaed123_b_monthly AS a INNER JOIN ccaet123_b_st_wt AS b
> ON a.enrolid = b.enrolid AND a.yr = b.yr AND a.datatyp = b.datatyp
> GROUP BY a.enrolid , a.yr , a.datatyp , ccaed123_b_svcmon
> ) WITH DATA
>
>
>
> disk drive has 5.46 TB of free space. windows task manager says 603 GB of 768 GB
> of RAM are taken, but the R process only says it's holding 26GB of RAM. when i
> shutdown the database, the 603GB of RAM goes to less than 100GB (used by
> unrelated processes). then i re-start the database and get the same error, so i
> doubt this is resource-related?
>
>
>
> row counts of the two tables being merged:
>
>> dbGetQuery( db , "SELECT COUNT(*) FROM ccaed123_b_monthly" )
> L3
> 1 70024781
>> dbGetQuery( db , "SELECT COUNT(*) FROM ccaet123_b_st_wt" )
> L3
> 1 204029495
>
>
>
> here's the locale:
>
>
>> sessionInfo()
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows Server 2012 R2 x64 (build 9600)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] haven_1.1.0 stringr_1.2.0 snow_0.4-2 MonetDBLite_0.5.0
> [5] DBI_0.7 sqldf_0.4-11 RSQLite_2.0 gsubfn_0.6-6
> [9] proto_1.0.0 reshape2_1.4.2 R.utils_2.6.0 R.oo_1.21.0
> [13] R.methodsS3_1.7.1
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13 bindr_0.1 magrittr_1.5 bit_1.1-12
> [5] R6_2.2.2 rlang_0.1.2 dplyr_0.7.4 blob_1.1.0
> [9] plyr_1.8.4 tcltk_3.4.2 tools_3.4.2 assertthat_0.2.0
> [13] bit64_0.9-7 digest_0.6.12 tibble_1.3.4 bindrcpp_0.2
> [17] codetools_0.2-15 glue_1.2.0 memoise_1.1.0 stringi_1.1.5
> [21] compiler_3.4.2 forcats_0.2.0 chron_2.3-51 pkgconfig_2.0.1
>
>
>
> thanks for looking at this
>
>
> On Wed, Nov 22, 2017 at 7:00 AM, Stefan Manegold < stefan.manegold@cwi.nl >
> wrote:
>
>
>
> Allocating "only" about 2.8 GB does not ring alarms that the plan night be
> degenerated - assuming your data is not tiny.
>
> Could you share:
> OS, full query, data sizes, free disk space at time of failure, mserver5 process
> size at time of failure?
>
> Thanks,
> Stefan
>
>
> -------- Original message --------
> From: Anthony Damico < ajdamico@gmail.com >
> Date: 11/22/17 12:13 (GMT+01:00)
> To: Communication channel for MonetDB users < users-list@monetdb.org >
> Subject: algebra.join heapextend crash
>
> hi, i'm using monetdblite-r 0.5.0 on a confidential dataset. my syntax worked on
> monetdblite 0.3.1 on the same machine
>
>
> ftruncate: Invalid argument
> Error in .local(conn, statement, ...) :
> Unable to execute statement 'CREATE TABLE ccaed123_b_svcmon_xwalk AS
> ( SELECT
> a.enrolid ,
> a.yr ,
> a.dataty...'.
> Server says 'MALException:algebra.join:GDK reported error. !ERROR: GDKextendf: c
> !ERROR: HEAPextend: failed to extend to 2890530816 for 02\57\25766.tail: GDKmre
> map() failed '.
>
>
> it's a local drive, the hard disk isn't close to full and the machine has 700gb
> of ram, but the table it's trying to create might be bigger than that. dunno if
> related to https://www.monetdb.org/bugzilla/show_bug.cgi?id=3791
>
> thanks
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list