additional details on IS NULL problem

Hi Anthony - I’ve narrowed down the problem. The bug isn’t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not. Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * …) into ‘/workdir/…kellyFirst50.txt using delimiters ‘\t’,’\n’; Then I created a new table called kellyFirst50 and used the “COPY INTO kellyfirst50 from ‘/workdir…kellyFirst50.txt’ using delimiters ‘\t,’\n’; ” command. Checking for “IS NULL” worked, so I thought it was a size problem. But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values. Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not: sql>\\ sql>select count(*) from weirtestnull; +------+ | L1 | +======+ | 50 | +------+ 1 tuple (2.014ms) sql>select count(amesbyd_weir) from weirtestnull; +------+ | L1 | +======+ | 50 | +------+ 1 tuple (0.865ms) sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL; +------+ | L1 | +======+ | 34 | +------+ 1 tuple (3.005ms) sql> sql>select * from weirtestnull where amesbyd_weir IS NULL; +-----+-----+---------------+--------------+--------------+----------------+--------------------+------------------+----------------+ | chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |> +=====+=====+===============+==============+==============+================+====================+==================+================+ +-----+-----+---------------+--------------+--------------+----------------+--------------------+------------------+----------------+ 0 tuples (4.041ms) !4 columns dropped! sql> Dump the data into a .txt file: sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n'; 50 affected rows (3.809ms) sql> Now drop the table, recreate it, but this time load from the txt file created above: sql>drop table weirtestnull; operation successful (1.948ms) sql>\copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n'; 50 affected rows (57.643ms) sql> Re-run the commands – I now get correct values for IS NULL: sql>select count(*) from weirtestnull; +------+ | L1 | +======+ | 50 | +------+ 1 tuple (1.851ms) sql>select count(amesbyd_weir) from weirtestnull; +------+ | L1 | +======+ | 34 | +------+ 1 tuple (0.983ms) sql> sql>select * from weirtestnull where amesbyd_weir IS NULL; +------+--------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+ | chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |> : : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :> +======+========+============+============+============+============+=============+=============+=============+=============+=============+=============+ | 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | null | -0.00029885 | null | : : : 5 : : : 25 : : : 4 : : 1 : : | 10 | 228962 | null | null | null | null | null | null | null | null | null | null | | 10 | 228972 | null | null | null | null | null | null | null | null | null | null | | 10 | 229011 | null | null | null | null | null | null | null | null | null | null | | 10 | 229025 | null | null | null | null | null | null | null | null | null | null | | 10 | 229187 | null | null | null | null | null | null | null | null | null | null | | 10 | 229367 | null | null | null | null | null | null | null | null | null | null | | 10 | 229405 | null | null | null | null | null | null | null | null | null | null | | 10 | 229444 | null | null | null | null | null | null | null | null | null | null | | 10 | 229593 | null | null | null | null | null | null | null | null | null | null | | 10 | 229615 | null | null | null | null | null | null | null | null | null | null | | 10 | 229671 | null | null | null | null | null | null | null | null | null | null | | 10 | 229829 | null | null | null | null | null | null | null | null | null | null | | 10 | 230000 | null | null | null | null | null | null | null | null | null | null | | 10 | 230052 | null | null | null | null | null | null | null | null | null | null | | 10 | 230120 | null | null | null | null | null | null | null | null | null | null | +------+--------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+ 16 tuples (5.2 My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get “null” stored in the db. When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files? With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere? Thanks - Lynn

Hi Lynn, thanks for diving into this! The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem! In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937 This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem. Of what type are your data / columns? Integer? decimal? floating point? string? If more then one type, do you experience these problems with all types or just with some? If only some, which? If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses? Otherwise a potential source of the problem --- I'm just speculating --- could be the following: You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation. For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ... Best, Stefan ----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I’ve narrowed down the problem. The bug isn’t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * …) into ‘/workdir/…kellyFirst50.txt using delimiters ‘\t’,’\n’;
Then I created a new table called kellyFirst50 and used the “COPY INTO kellyfirst50 from ‘/workdir…kellyFirst50.txt’ using delimiters ‘\t,’\n’; ” command. Checking for “IS NULL” worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+----------------+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+================+====================+==================+================+
+-----+-----+---------------+--------------+--------------+----------------+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands – I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+=============+=============+=============+=============+=============+=============+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get “null” stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

For this particular run, the null¹s were all from Float.NaN, so perhaps
that is the problem.
My list-of-things-to-do includes checking if null works properly when
using *.MIN_VALUE, as we are using that as well for int and byte. We have
only recently changed our data to include ³null² as previously we had
Spark issues so everything was given a numeric value.
I will run and let you know. It will be no problem to store
Float.MIN_VALUE instead of Float.NaN
On 5/23/16, 4:06 PM, "users-list on behalf of Stefan Manegold"
Hi Lynn,
thanks for diving into this!
The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem!
In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937
This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem.
Of what type are your data / columns? Integer? decimal? floating point? string?
If more then one type, do you experience these problems with all types or just with some? If only some, which?
If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses?
Otherwise a potential source of the problem --- I'm just speculating --- could be the following:
You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation.
For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system
If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ...
Best, Stefan
----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I¹ve narrowed down the problem. The bug isn¹t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * Š) into Œ/workdir/ŠkellyFirst50.txt using delimiters Œ\t¹,¹\n¹;
Then I created a new table called kellyFirst50 and used the ³COPY INTO kellyfirst50 from Œ/workdirŠkellyFirst50.txt¹ using delimiters Œ\t,¹\n¹; ² command. Checking for ³IS NULL² worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+-------------- --+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+============== ==+====================+==================+================+
+-----+-----+---------------+--------------+--------------+-------------- --+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+---- ---------+-------------+-------------+-------------+-------------+------- ------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+==== =========+=============+=============+=============+=============+======= ======+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+---- ---------+-------------+-------------+-------------+-------------+------- ------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get ³null² stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Update: I changed my missing values to be Float.MIN_VALUE instead of
Float.NaN. This did not fix the problem. When they were Float.Nan, and
loaded from binary, the data showed up as “NULL” when doing a select
(though “count” did NOT seem to recognize this as null).
When these values are stored as Float.MIN_VALUE and loaded from binary
files, they appear in the data base as 1.401e-45. This is correct as this
is the value of Float.MIN_VALUE, but I had thought from reading the
documentation that this would be translated to null.
When I export the data to a text file, then drop/reload the table from the
text file, the values are still 1.401e-45, so neither IS NULL or IS NOT
NULL work.
The table from your link shows “nil" should be “GDK_flt_min”. Is that
value something other than Float.MIN_VALUE?
On 5/23/16, 4:30 PM, "users-list on behalf of Lynn Carol Johnson"
For this particular run, the null¹s were all from Float.NaN, so perhaps that is the problem.
My list-of-things-to-do includes checking if null works properly when using *.MIN_VALUE, as we are using that as well for int and byte. We have only recently changed our data to include ³null² as previously we had Spark issues so everything was given a numeric value.
I will run and let you know. It will be no problem to store Float.MIN_VALUE instead of Float.NaN
On 5/23/16, 4:06 PM, "users-list on behalf of Stefan Manegold"
wrote: Hi Lynn,
thanks for diving into this!
The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem!
In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937
This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem.
Of what type are your data / columns? Integer? decimal? floating point? string?
If more then one type, do you experience these problems with all types or just with some? If only some, which?
If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses?
Otherwise a potential source of the problem --- I'm just speculating --- could be the following:
You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation.
For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system
If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ...
Best, Stefan
----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I¹ve narrowed down the problem. The bug isn¹t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * Š) into Œ/workdir/ŠkellyFirst50.txt using delimiters Œ\t¹,¹\n¹;
Then I created a new table called kellyFirst50 and used the ³COPY INTO kellyfirst50 from Œ/workdirŠkellyFirst50.txt¹ using delimiters Œ\t,¹\n¹; ² command. Checking for ³IS NULL² worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+------------- - --+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+============= = ==+====================+==================+================+
+-----+-----+---------------+--------------+--------------+------------- - --+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+--- - ---------+-------------+-------------+-------------+-------------+------ - ------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+=== = =========+=============+=============+=============+=============+====== = ======+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+--- - ---------+-------------+-------------+-------------+-------------+------ - ------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get ³null² stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi Lynn,
here are the NIL/NULL values that MonetDB uses for 32-bit FLT (REAL)
and 64-bit DBL (FLOAT/DOUBLE), respectively:
$ cat x.c
#include
Update: I changed my missing values to be Float.MIN_VALUE instead of Float.NaN. This did not fix the problem. When they were Float.Nan, and loaded from binary, the data showed up as “NULL” when doing a select (though “count” did NOT seem to recognize this as null).
When these values are stored as Float.MIN_VALUE and loaded from binary files, they appear in the data base as 1.401e-45. This is correct as this is the value of Float.MIN_VALUE, but I had thought from reading the documentation that this would be translated to null.
When I export the data to a text file, then drop/reload the table from the text file, the values are still 1.401e-45, so neither IS NULL or IS NOT NULL work.
The table from your link shows “nil" should be “GDK_flt_min”. Is that value something other than Float.MIN_VALUE?
On 5/23/16, 4:30 PM, "users-list on behalf of Lynn Carol Johnson"
wrote: For this particular run, the null¹s were all from Float.NaN, so perhaps that is the problem.
My list-of-things-to-do includes checking if null works properly when using *.MIN_VALUE, as we are using that as well for int and byte. We have only recently changed our data to include ³null² as previously we had Spark issues so everything was given a numeric value.
I will run and let you know. It will be no problem to store Float.MIN_VALUE instead of Float.NaN
On 5/23/16, 4:06 PM, "users-list on behalf of Stefan Manegold"
wrote: Hi Lynn,
thanks for diving into this!
The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem!
In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937
This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem.
Of what type are your data / columns? Integer? decimal? floating point? string?
If more then one type, do you experience these problems with all types or just with some? If only some, which?
If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses?
Otherwise a potential source of the problem --- I'm just speculating --- could be the following:
You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation.
For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system
If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ...
Best, Stefan
----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I¹ve narrowed down the problem. The bug isn¹t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * Š) into Œ/workdir/ŠkellyFirst50.txt using delimiters Œ\t¹,¹\n¹;
Then I created a new table called kellyFirst50 and used the ³COPY INTO kellyfirst50 from Œ/workdirŠkellyFirst50.txt¹ using delimiters Œ\t,¹\n¹; ² command. Checking for ³IS NULL² worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+------------- - --+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+============= = ==+====================+==================+================+
+-----+-----+---------------+--------------+--------------+------------- - --+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+--- - ---------+-------------+-------------+-------------+-------------+------ - ------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+=== = =========+=============+=============+=============+=============+====== = ======+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+--- - ---------+-------------+-------------+-------------+-------------+------ - ------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get ³null² stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

Thanks, Stephan.
Short response good news: using int_nil = ‹2147483648 (-2^31) is
translated properly to NULL for int values. The IS NULL and IS NOT NULL
queries are fine.
Short response bad news: I am unable to get NULL to work for Float
regardless of what value I used. Details below:
I replaced my Float.MIN_VALUE with flt_nil defined as below :
public static float flt_nil = (float) -3.40282e+38;
Ran again with just 50 rows (created as binary files). At this point my
belief is NULL does not work at all for float with binary files. There are
16 of the
50 values for column amesbyd_weir that are null, but I can only select
those values if I check for the column < -3.40282e+38. Here are the sql
results:
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.933ms)
sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir IS
NULL;
+-----+-----+--------------+
| chr | pos | amesbyd_weir |
+=====+=====+==============+
+-----+-----+--------------+
0 tuples (1.291ms)
sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir =
-3.40282e+38;
+-----+-----+--------------+
| chr | pos | amesbyd_weir |
+=====+=====+==============+
+-----+-----+--------------+
0 tuples (0.528ms)
sql>
sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir <
-3.40282e+38;
+------+--------+-----------------+
| chr | pos | amesbyd_weir |
+======+========+=================+
| 10 | 228864 | -3.40282e+38 |
| 10 | 228962 | -3.40282e+38 |
| 10 | 228972 | -3.40282e+38 |
| 10 | 229011 | -3.40282e+38 |
| 10 | 229025 | -3.40282e+38 |
| 10 | 229187 | -3.40282e+38 |
| 10 | 229367 | -3.40282e+38 |
| 10 | 229405 | -3.40282e+38 |
| 10 | 229444 | -3.40282e+38 |
| 10 | 229593 | -3.40282e+38 |
| 10 | 229615 | -3.40282e+38 |
| 10 | 229671 | -3.40282e+38 |
| 10 | 229829 | -3.40282e+38 |
| 10 | 230000 | -3.40282e+38 |
| 10 | 230052 | -3.40282e+38 |
| 10 | 230120 | -3.40282e+38 |
+------+--------+-----------------+
16 tuples (2.999ms)
sql>
NOTE: Dumping the 50 row database to a .txt file, dropping the table and
reloading into monetdb did NOT fix it as it did when I used Float.NaN
values. That is actually irrelevant for us as our db tables are large
(2.1 Billion rows in 1 table, 80M in another) and as we add additional
columns, binary loading is the only way to go. We love the binary load
option, and have been very happy with the speed of monetdb processing.
For now, we¹re going to follow PLINK convention and use -99 for our Float
null values.
Thanks - Lynn
On 5/23/16, 4:06 PM, "users-list on behalf of Stefan Manegold"
Hi Lynn,
thanks for diving into this!
The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem!
In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937
This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem.
Of what type are your data / columns? Integer? decimal? floating point? string?
If more then one type, do you experience these problems with all types or just with some? If only some, which?
If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses?
Otherwise a potential source of the problem --- I'm just speculating --- could be the following:
You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation.
For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system
If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ...
Best, Stefan
----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I¹ve narrowed down the problem. The bug isn¹t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * Š) into Œ/workdir/ŠkellyFirst50.txt using delimiters Œ\t¹,¹\n¹;
Then I created a new table called kellyFirst50 and used the ³COPY INTO kellyfirst50 from Œ/workdirŠkellyFirst50.txt¹ using delimiters Œ\t,¹\n¹; ² command. Checking for ³IS NULL² worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+-------------- --+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+============== ==+====================+==================+================+
+-----+-----+---------------+--------------+--------------+-------------- --+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+---- ---------+-------------+-------------+-------------+-------------+------- ------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+==== =========+=============+=============+=============+=============+======= ======+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+---- ---------+-------------+-------------+-------------+-------------+------- ------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get ³null² stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi Lynn, public static float flt_nil = (float) -3.40282e+38; does not necessarily create the exact bit pattern as C macro -FLT_MAX, mainly as "-3.40282e+38" is only an approximation of the exact value: "-340282346638528859811704183484516925440" You can either try public static float flt_nil = (float) -340282346638528859811704183484516925440; or use the actual bit pattern in hex notation, e.g., public static float flt_nil = (float) 0xff7fffff; Please take my Java(?) code snippets with a grain of salt; I'm everything but a Java expert --- or better, I'm a Java illiterate ... Be also aware that C type float, MAL typr :flt and SQL type REAL are 32-bit, while C type double, MAL type :dbl, and SQL types FLOAT & DOUBLE are 64-bit Hence, for 64-bit types you'd need to use one of public static double dbl_nil = (double) -179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368; public static double dbl_nil = (double) 0xffefffffffffffff; Hope this helps ... Best, Stefan ----- On May 24, 2016, at 3:42 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Thanks, Stephan.
Short response good news: using int_nil = ‹2147483648 (-2^31) is translated properly to NULL for int values. The IS NULL and IS NOT NULL queries are fine.
Short response bad news: I am unable to get NULL to work for Float regardless of what value I used. Details below:
I replaced my Float.MIN_VALUE with flt_nil defined as below :
public static float flt_nil = (float) -3.40282e+38;
Ran again with just 50 rows (created as binary files). At this point my belief is NULL does not work at all for float with binary files. There are 16 of the 50 values for column amesbyd_weir that are null, but I can only select those values if I check for the column < -3.40282e+38. Here are the sql results:
sql>select count(amesbyd_weir) from weirtestnull; +------+ | L1 | +======+ | 50 | +------+ 1 tuple (1.933ms) sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir IS NULL; +-----+-----+--------------+ | chr | pos | amesbyd_weir | +=====+=====+==============+ +-----+-----+--------------+ 0 tuples (1.291ms) sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir = -3.40282e+38; +-----+-----+--------------+ | chr | pos | amesbyd_weir | +=====+=====+==============+ +-----+-----+--------------+ 0 tuples (0.528ms) sql> sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir < -3.40282e+38; +------+--------+-----------------+ | chr | pos | amesbyd_weir | +======+========+=================+ | 10 | 228864 | -3.40282e+38 | | 10 | 228962 | -3.40282e+38 | | 10 | 228972 | -3.40282e+38 | | 10 | 229011 | -3.40282e+38 | | 10 | 229025 | -3.40282e+38 | | 10 | 229187 | -3.40282e+38 | | 10 | 229367 | -3.40282e+38 | | 10 | 229405 | -3.40282e+38 | | 10 | 229444 | -3.40282e+38 | | 10 | 229593 | -3.40282e+38 | | 10 | 229615 | -3.40282e+38 | | 10 | 229671 | -3.40282e+38 | | 10 | 229829 | -3.40282e+38 | | 10 | 230000 | -3.40282e+38 | | 10 | 230052 | -3.40282e+38 | | 10 | 230120 | -3.40282e+38 | +------+--------+-----------------+ 16 tuples (2.999ms) sql>
NOTE: Dumping the 50 row database to a .txt file, dropping the table and reloading into monetdb did NOT fix it as it did when I used Float.NaN values. That is actually irrelevant for us as our db tables are large (2.1 Billion rows in 1 table, 80M in another) and as we add additional columns, binary loading is the only way to go. We love the binary load option, and have been very happy with the speed of monetdb processing.
For now, we¹re going to follow PLINK convention and use -99 for our Float null values.
Thanks - Lynn
On 5/23/16, 4:06 PM, "users-list on behalf of Stefan Manegold"
wrote: Hi Lynn,
thanks for diving into this!
The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem!
In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937
This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem.
Of what type are your data / columns? Integer? decimal? floating point? string?
If more then one type, do you experience these problems with all types or just with some? If only some, which?
If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses?
Otherwise a potential source of the problem --- I'm just speculating --- could be the following:
You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation.
For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system
If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ...
Best, Stefan
----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I¹ve narrowed down the problem. The bug isn¹t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * Š) into Œ/workdir/ŠkellyFirst50.txt using delimiters Œ\t¹,¹\n¹;
Then I created a new table called kellyFirst50 and used the ³COPY INTO kellyfirst50 from Œ/workdirŠkellyFirst50.txt¹ using delimiters Œ\t,¹\n¹; ² command. Checking for ³IS NULL² worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+-------------- --+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+============== ==+====================+==================+================+
+-----+-----+---------------+--------------+--------------+-------------- --+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+---- ---------+-------------+-------------+-------------+-------------+------- ------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+==== =========+=============+=============+=============+=============+======= ======+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+---- ---------+-------------+-------------+-------------+-------------+------- ------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get ³null² stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

Yes, we’re using float as 32 bit, storing into a real.
Still no sunshine with your suggestions below. Java doesn’t allow me to
assign
public static float flt_nil = (float)
-340282346638528859811704183484516925440;
It complains “the literal -340282346638528859811704183484516925440 of type
int is out of range”. This isn’t surprising as 32 bits will only hold up
to -2^31 (-2,147,483,648)
Using static float flt_nil = (float) 0xff7fffff; gives me the value
-8388609.0, which is incorrect.
Does storing null work for you using “float” types? Is this just a
java-to-C issue? Still looks like we’ll need to pick a default other than
NULL for our “real” data.
Thanks - Lynn
On 5/24/16, 9:53 AM, "users-list on behalf of Stefan Manegold"
Hi Lynn,
public static float flt_nil = (float) -3.40282e+38;
does not necessarily create the exact bit pattern as C macro -FLT_MAX, mainly as "-3.40282e+38" is only an approximation of the exact value: "-340282346638528859811704183484516925440"
You can either try
public static float flt_nil = (float) -340282346638528859811704183484516925440;
or use the actual bit pattern in hex notation, e.g.,
public static float flt_nil = (float) 0xff7fffff;
Please take my Java(?) code snippets with a grain of salt; I'm everything but a Java expert --- or better, I'm a Java illiterate ...
Be also aware that C type float, MAL typr :flt and SQL type REAL are 32-bit, while C type double, MAL type :dbl, and SQL types FLOAT & DOUBLE are 64-bit
Hence, for 64-bit types you'd need to use one of
public static double dbl_nil = (double) -1797693134862315708145274237317043567980705675258449965989174768031572607 80028538760589558632766878171540458953514382464234321326889464182768467546 70353751698604991057655128207624549009038932894407586850845513394230458323 69032229481658085593321233482747978262041447231687381771809192998812504040 26184124858368;
public static double dbl_nil = (double) 0xffefffffffffffff;
Hope this helps ...
Best, Stefan
----- On May 24, 2016, at 3:42 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Thanks, Stephan.
Short response good news: using int_nil = ‹2147483648 (-2^31) is translated properly to NULL for int values. The IS NULL and IS NOT NULL queries are fine.
Short response bad news: I am unable to get NULL to work for Float regardless of what value I used. Details below:
I replaced my Float.MIN_VALUE with flt_nil defined as below :
public static float flt_nil = (float) -3.40282e+38;
Ran again with just 50 rows (created as binary files). At this point my belief is NULL does not work at all for float with binary files. There are 16 of the 50 values for column amesbyd_weir that are null, but I can only select those values if I check for the column < -3.40282e+38. Here are the sql results:
sql>select count(amesbyd_weir) from weirtestnull; +------+ | L1 | +======+ | 50 | +------+ 1 tuple (1.933ms) sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir IS NULL; +-----+-----+--------------+ | chr | pos | amesbyd_weir | +=====+=====+==============+ +-----+-----+--------------+ 0 tuples (1.291ms) sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir = -3.40282e+38; +-----+-----+--------------+ | chr | pos | amesbyd_weir | +=====+=====+==============+ +-----+-----+--------------+ 0 tuples (0.528ms) sql> sql>select chr,pos,amesbyd_weir from weirtestnull where amesbyd_weir < -3.40282e+38; +------+--------+-----------------+ | chr | pos | amesbyd_weir | +======+========+=================+ | 10 | 228864 | -3.40282e+38 | | 10 | 228962 | -3.40282e+38 | | 10 | 228972 | -3.40282e+38 | | 10 | 229011 | -3.40282e+38 | | 10 | 229025 | -3.40282e+38 | | 10 | 229187 | -3.40282e+38 | | 10 | 229367 | -3.40282e+38 | | 10 | 229405 | -3.40282e+38 | | 10 | 229444 | -3.40282e+38 | | 10 | 229593 | -3.40282e+38 | | 10 | 229615 | -3.40282e+38 | | 10 | 229671 | -3.40282e+38 | | 10 | 229829 | -3.40282e+38 | | 10 | 230000 | -3.40282e+38 | | 10 | 230052 | -3.40282e+38 | | 10 | 230120 | -3.40282e+38 | +------+--------+-----------------+ 16 tuples (2.999ms) sql>
NOTE: Dumping the 50 row database to a .txt file, dropping the table and reloading into monetdb did NOT fix it as it did when I used Float.NaN values. That is actually irrelevant for us as our db tables are large (2.1 Billion rows in 1 table, 80M in another) and as we add additional columns, binary loading is the only way to go. We love the binary load option, and have been very happy with the speed of monetdb processing.
For now, we¹re going to follow PLINK convention and use -99 for our Float null values.
Thanks - Lynn
On 5/23/16, 4:06 PM, "users-list on behalf of Stefan Manegold"
wrote: Hi Lynn,
thanks for diving into this!
The information that you loaded your data using from binary files using copy *binary* into is crucial for us to understand and investigate the problem!
In fact, I recently discovered and fixed a similar bug; cf., https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937
This fix is in the Jul2013-SP3 bug-fix release. Hence, I'm wondering why you (still) experience this (or a similar) problem.
Of what type are your data / columns? Integer? decimal? floating point? string?
If more then one type, do you experience these problems with all types or just with some? If only some, which?
If the problem occurs only with floating point numbers, are you sure you're using the exact NULL representation that MonetDB uses?
Otherwise a potential source of the problem --- I'm just speculating --- could be the following:
You might have "non-valid" floating point numbers like NaN or INF in your data that are not known to SQL and hence MonetDB. If so, When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's internal representation of) NULL. However, when loding your binary data, MonetDB stores these values "as-is" not recognizing them as NULL; then processing does not recognize them as NULL, either; only rendering produces NULL from the "non-valid" internal representation.
For the correct internal representation of NULL values in MonetDB per type see https://www.monetdb.org/wiki/MonetDB_type_system
If my speculation is true, we'd need to consider making our copy binary into bulk-loading for floating point numbers (more) fail-save by actually scanning each column and checking for "invalid" floating point number --- whatever that might cost ...
Best, Stefan
----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi Anthony -
I¹ve narrowed down the problem. The bug isn¹t related to size, but rather how the table data was created/loaded. Loading from a .txt file gives good behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to load. That is what was failing. When I first tested with 50 columns, I created the test file from sql via the command copy (select * Š) into Œ/workdir/ŠkellyFirst50.txt using delimiters Œ\t¹,¹\n¹;
Then I created a new table called kellyFirst50 and used the ³COPY INTO kellyfirst50 from Œ/workdirŠkellyFirst50.txt¹ using delimiters Œ\t,¹\n¹; ² command. Checking for ³IS NULL² worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java scripts to create binary files of ever decreasing sizes, then loaded them via the COPY BINARY commands. Checking for IS NULL always failed here, even when I only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+------------ -- --+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir | | amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+============ == ==+====================+==================+================+
+-----+-----+---------------+--------------+--------------+------------ -- --+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+-- -- ---------+-------------+-------------+-------------+-------------+----- -- ------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke | | dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir : eypen_weir :>
+======+========+============+============+============+============+== == =========+=============+=============+=============+=============+===== == ======+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749 | | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null | | null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null | | null |
+------+--------+------------+------------+------------+------------+-- -- ---------+-------------+-------------+-------------+-------------+----- -- ------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get ³null² stored in the db.
When we load directly from binary, are the null values created differently? Is something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Lynn Carol Johnson
-
Stefan Manegold