creating a binary file for "COPY binary ..."
Hi all - I'm creating a large db table with several columns. To speed it up, I'm using monetdb's "COPY BINARY" command. I have a question on creating the binary files. What I've done is use java's DataOutputStream to create the binary file - using code that looks as below (sometimes an int, sometimes a real or byte): writerChr = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(chrFile))); writerChr.writeInt(intChrom); When I read this file using a test program and java's DataInputStream, it correctly shows me the integer values I've written. In the "writerChr" case, all values in this particular column are "10". However, when I load this into monetdb the value is way off - it shows a large number. The command I am using to copy this file and one other into a table with 2 "int" columns is: COPY binary into justchrpos from ('/home/lcj34/chrFileC10.bin','/home/lcj34/posFileC10.bin'); It copies in fine, but when I cheek the values I see this: (not 10!) sql>\d justchrpos CREATE TABLE "testbig"."justchrpos" ( "chr" INTEGER, "pos" INTEGER ); sql>select * from justchrpos limit 10; +-----------+-----------+ | chr | pos | +===========+===========+ | 167772160 | 16777216 | | 167772160 | 33554432 | | 167772160 | 50331648 | | 167772160 | 67108864 | | 167772160 | 83886080 | | 167772160 | 100663296 | | 167772160 | 117440512 | | 167772160 | 134217728 | | 167772160 | 150994944 | | 167772160 | 167772160 | +-----------+-----------+ 10 tuples (3.601ms) sql> QUESTION: Is using java's DataOutputStream a valid way to create the binary files for monetdb? If not, what should I be using? As I mentioned above, when I read back the created file using java's DataInputStream, I see the values "10" as I expect. Thanks - Lynn
hi The binary files should be ordinary binary files you typical create with a C program. So the question is if a minimal C program can correctly read your numbers produced through Java. Be aware of the relative sizes of the int representations you may encounter (32/64 bits). regards, Martin On 18/01/16 23:37, Lynn Carol Johnson wrote:
Hi all -
I’m creating a large db table with several columns. To speed it up, I’m using monetdb’s “COPY BINARY” command. I have a question on creating the binary files. What I’ve done is use java’s DataOutputStream to create the binary file – using code that looks as below (sometimes an int, sometimes a real or byte):
writerChr = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(chrFile)));
writerChr.writeInt(intChrom);
When I read this file using a test program and java’s DataInputStream, it correctly shows me the integer values I’ve written. In the “writerChr” case, all values in this particular column are “10”.
However, when I load this into monetdb the value is way off – it shows a large number. The command I am using to copy this file and one other into a table with 2 “int” columns is:
COPY binary into justchrpos from ('/home/lcj34/chrFileC10.bin','/home/lcj34/posFileC10.bin’);
It copies in fine, but when I cheek the values I see this: (not 10!)
sql>\d justchrpos
CREATE TABLE "testbig"."justchrpos" (
"chr" INTEGER,
"pos" INTEGER
);
sql>select * from justchrpos limit 10;
+-----------+-----------+
| chr | pos |
+===========+===========+
| 167772160 | 16777216 |
| 167772160 | 33554432 |
| 167772160 | 50331648 |
| 167772160 | 67108864 |
| 167772160 | 83886080 |
| 167772160 | 100663296 |
| 167772160 | 117440512 |
| 167772160 | 134217728 |
| 167772160 | 150994944 |
| 167772160 | 167772160 |
+-----------+-----------+
10 tuples (3.601ms)
sql>
QUESTION: Is using java’s DataOutputStream a valid way to create the binary files for monetdb? If not, what should I be using? As I mentioned above, when I read back the created file using java’s DataInputStream, I see the values “10” as I expect.
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi, then problem is byte ordering (endianess): I guess Java.io.DataOutputStream.writeInt() writes high-byte-first (big endian), while MonetDB expects your systems byte-order, which is (most probably) little-endian (i.e., an x86 or x86_64 platform). decimal 10 represented as 4-byte integer is hex 0x000000A0 in high-byte-first (big endian); interpreting that as low-byte-first (little endian) yields decimal 167772160. You'd need to find a Java function that writes low-byte-first (little endian). (I'm a Java-illiterate and hence would not know which function to use ...) Best, Stefan ----- On Jan 18, 2016, at 11:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi all -
I’m creating a large db table with several columns. To speed it up, I’m using monetdb’s “COPY BINARY” command. I have a question on creating the binary files. What I’ve done is use java’s DataOutputStream to create the binary file – using code that looks as below (sometimes an int, sometimes a real or byte):
writerChr = new DataOutputStream( new BufferedOutputStream( new FileOutputStream( chrFile )));
writerChr .writeInt( intChrom );
When I read this file using a test program and java’s DataInputStream, it correctly shows me the integer values I’ve written. In the “writerChr” case, all values in this particular column are “10”.
However, when I load this into monetdb the value is way off – it shows a large number. The command I am using to copy this file and one other into a table with 2 “int” columns is:
COPY binary into justchrpos from ('/home/lcj34/chrFileC10.bin','/home/lcj34/posFileC10.bin’);
It copies in fine, but when I cheek the values I see this: (not 10!)
sql>\d justchrpos
CREATE TABLE "testbig"."justchrpos" (
"chr" INTEGER,
"pos" INTEGER
);
sql>select * from justchrpos limit 10;
+-----------+-----------+
| chr | pos |
+===========+===========+
| 167772160 | 16777216 |
| 167772160 | 33554432 |
| 167772160 | 50331648 |
| 167772160 | 67108864 |
| 167772160 | 83886080 |
| 167772160 | 100663296 |
| 167772160 | 117440512 |
| 167772160 | 134217728 |
| 167772160 | 150994944 |
| 167772160 | 167772160 |
+-----------+-----------+
10 tuples (3.601ms)
sql>
QUESTION: Is using java’s DataOutputStream a valid way to create the binary files for monetdb? If not, what should I be using? As I mentioned above, when I read back the created file using java’s DataInputStream, I see the values “10” as I expect.
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Stephan and Martin - Thanks so much for the answers . I¹ll work on the
endian-ness.
On 1/18/16, 5:58 PM, "users-list on behalf of Stefan Manegold"
Hi,
then problem is byte ordering (endianess):
I guess Java.io.DataOutputStream.writeInt() writes high-byte-first (big endian), while MonetDB expects your systems byte-order, which is (most probably) little-endian (i.e., an x86 or x86_64 platform).
decimal 10 represented as 4-byte integer is hex 0x000000A0 in high-byte-first (big endian); interpreting that as low-byte-first (little endian) yields decimal 167772160.
You'd need to find a Java function that writes low-byte-first (little endian).
(I'm a Java-illiterate and hence would not know which function to use ...)
Best, Stefan
----- On Jan 18, 2016, at 11:37 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
Hi all -
I¹m creating a large db table with several columns. To speed it up, I¹m using monetdb¹s ³COPY BINARY² command. I have a question on creating the binary files. What I¹ve done is use java¹s DataOutputStream to create the binary file using code that looks as below (sometimes an int, sometimes a real or byte):
writerChr = new DataOutputStream( new BufferedOutputStream( new FileOutputStream( chrFile )));
writerChr .writeInt( intChrom );
When I read this file using a test program and java¹s DataInputStream, it correctly shows me the integer values I¹ve written. In the ³writerChr² case, all values in this particular column are ³10².
However, when I load this into monetdb the value is way off it shows a large number. The command I am using to copy this file and one other into a table with 2 ³int² columns is:
COPY binary into justchrpos from ('/home/lcj34/chrFileC10.bin','/home/lcj34/posFileC10.bin¹);
It copies in fine, but when I cheek the values I see this: (not 10!)
sql>\d justchrpos
CREATE TABLE "testbig"."justchrpos" (
"chr" INTEGER,
"pos" INTEGER
);
sql>select * from justchrpos limit 10;
+-----------+-----------+
| chr | pos |
+===========+===========+
| 167772160 | 16777216 |
| 167772160 | 33554432 |
| 167772160 | 50331648 |
| 167772160 | 67108864 |
| 167772160 | 83886080 |
| 167772160 | 100663296 |
| 167772160 | 117440512 |
| 167772160 | 134217728 |
| 167772160 | 150994944 |
| 167772160 | 167772160 |
+-----------+-----------+
10 tuples (3.601ms)
sql>
QUESTION: Is using java¹s DataOutputStream a valid way to create the binary files for monetdb? If not, what should I be using? As I mentioned above, when I read back the created file using java¹s DataInputStream, I see the values ³10² as I expect.
Thanks - Lynn
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Lynn Carol Johnson
-
Martin Kersten
-
Stefan Manegold