Re: additional details on IS NULL problem

23 May 2016

      Hi Lynn,

thanks for diving into this!

The information that you loaded your data using from binary files using
copy *binary* into is crucial for us to understand and investigate the problem!

In fact, I recently discovered and fixed a similar bug; cf.,
https://www.monetdb.org/bugzilla/show_bug.cgi?id=3937

This fix is in the Jul2013-SP3 bug-fix release.
Hence, I'm wondering why you (still) experience this (or a similar) problem.

Of what type are your data / columns?
Integer? decimal? floating point? string?

If more then one type, do you experience these problems with all types
or just with some?
If only some, which?

If the problem occurs only with floating point numbers,
are you sure you're using the exact NULL representation that MonetDB uses?

Otherwise a potential source of the problem --- I'm just speculating ---
could be the following:

You might have "non-valid" floating point numbers like NaN or INF in your data
that are not known to SQL and hence MonetDB.
If so,
When parsed from text, MonetDB does (seems to?) silently turn them into (MonetDB's
internal representation of) NULL.
However, when loding your binary data, MonetDB stores these values "as-is"
not recognizing them as NULL; then processing does not recognize them as NULL, either;
only rendering produces NULL from the "non-valid" internal representation.

For the correct internal representation of NULL values in MonetDB per type see
https://www.monetdb.org/wiki/MonetDB_type_system

If my speculation is true, we'd need to consider making our copy binary into
bulk-loading for floating point numbers (more) fail-save by actually scanning
each column and checking for "invalid" floating point number --- whatever that
might cost ...

Best,
Stefan

----- On May 23, 2016, at 8:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
...
Hi Anthony -
I’ve narrowed down the problem. The bug isn’t related to size, but rather how
the table data was created/loaded. Loading from a .txt file gives good
behavior, loading from binary does not.
Normally I create binary files for each column and use the binary bulk loader to
load. That is what was failing. When I first tested with 50 columns, I created
the test file from sql via the command
copy (select * …) into ‘/workdir/…kellyFirst50.txt using delimiters ‘\t’,’\n’;
Then I created a new table called kellyFirst50 and used the “COPY INTO
kellyfirst50 from ‘/workdir…kellyFirst50.txt’ using delimiters ‘\t,’\n’; ”
command. Checking for “IS NULL” worked, so I thought it was a size problem.
But, when trying to figure out at what point the file size mattered, I ran java
scripts to create binary files of ever decreasing sizes, then loaded them via
the COPY BINARY commands. Checking for IS NULL always failed here, even when I
only loaded 50 values.
Here are the results when loading from binary files into monetdb. Using IS NOT
NULL works, but IS NULL does not:
sql>\
operation successful
sql>\
50 affected rows
sql>
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (2.014ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (0.865ms)
sql>select count(amesbyd_weir) from weirtestnull where amesbyd_weir IS NOT NULL;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (3.005ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+-----+-----+---------------+--------------+--------------+----------------+--------------------+------------------+----------------+
| chr | pos | amesbycn_weir | amesbyd_weir | amesbyf_weir | amesbynam_weir |
| amesbyturkeyp_weir | dentbyflint_weir | dentbynam_weir |>
+=====+=====+===============+==============+==============+================+====================+==================+================+
+-----+-----+---------------+--------------+--------------+----------------+--------------------+------------------+----------------+
0 tuples (4.041ms) !4 columns dropped!
sql>
Dump the data into a .txt file:
sql>copy (select * from weirtestnull) into
'/workdir/db_outfiles/weirtestnull.txt' using delimiters '\t','\n';
50 affected rows (3.809ms)
sql>
Now drop the table, recreate it, but this time load from the txt file created
above:
sql>drop table weirtestnull;
operation successful (1.948ms)
sql>\
operation successful
sql>copy into weirtestnull from '/workdir/db_outfiles/weirtestnull.txt' using
delimiters '\t','\n';
50 affected rows (57.643ms)
sql>
Re-run the commands – I now get correct values for IS NULL:
sql>select count(*) from weirtestnull;
+------+
| L1 |
+======+
| 50 |
+------+
1 tuple (1.851ms)
sql>select count(amesbyd_weir) from weirtestnull;
+------+
| L1 |
+======+
| 34 |
+------+
1 tuple (0.983ms)
sql>
sql>select * from weirtestnull where amesbyd_weir IS NULL;
+------+--------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+
| chr | pos | amesbycn_w | amesbyd_we | amesbyf_we | amesbynam_ | amesbyturke |
| dentbyflint | dentbynam_w | dentbyturke | flintbynam_ | flintbyturk |>
: : : eir : ir : ir : weir : yp_weir : _weir : eir : ypen_weir : weir :
eypen_weir :>
+======+========+============+============+============+============+=============+=============+=============+=============+=============+=============+
| 10 | 228864 | 2.95496e-0 | null | null | -0.0001263 | null | null | -0.00028749
| | null | -0.00029885 | null |
: : : 5 : : : 25 : : : 4 : : 1 : :
| 10 | 228962 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 228972 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229011 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229025 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229187 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229367 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229405 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229444 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229593 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229615 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229671 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 229829 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 230000 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 230052 | null | null | null | null | null | null | null | null | null |
| null |
| 10 | 230120 | null | null | null | null | null | null | null | null | null |
| null |
+------+--------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+
16 tuples (5.2
My java program uses Float.NaN and Integer.MIN_VALUE, Byte.MIN_VALUE to get
“null” stored in the db.
When we load directly from binary, are the null values created differently? Is
something not marked that gets marked when loading from text files?
With this information, can you reproduce, or do you want me to give you my
binary files in a dropbox somewhere?
Thanks - Lynn
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
-- 
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

Re: additional details on IS NULL problem

Stefan Manegold