[MonetDB-users] ascii_io
Greetings! Looking at the ascii_io module, I can't seem to find a way for loading up multiple BATs where one column of the input is specified as the head value. Is there such an option, or is it always a VOID loadup? In particular, I have a << lng >> column as the primary key. So, using my current understanding, I would need to load up a set of dummy/temp BATs, and then using these load the data to the real BATs. Regards! Ed
On Fri, Feb 11, 2005 at 03:17:16PM -0500, Edmund Dengler wrote:
Greetings!
Looking at the ascii_io module, I can't seem to find a way for loading up multiple BATs where one column of the input is specified as the head value. Is there such an option, or is it always a VOID loadup?
In particular, I have a << lng >> column as the primary key. So, using my current understanding, I would need to load up a set of dummy/temp BATs, and then using these load the data to the real BATs.
Ascii_io was designed for loading wide-tables. Each column of the table is loaded into a bat[void,val]. The void heads link the columns together. So if one of the columns of this table is your primary key (lng) that is not a problem. Probably you could do without the primary (unless referenced in one of the other columns). Niels
Regards! Ed
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Hi Niels! So, if I was doing multiple loads (which would be how I would be operating, see previous posts re structuring for multiple loads), then I would definitely need to load to dummy/tmp BATs, and then append my data to the real BATs (as I will be exceeding the 1 Billion row limit using a 30-bit OID)? The << lng >> is being used since there is no 64-bit VOID. Regards! Ed On Fri, 11 Feb 2005, Niels Nes wrote:
On Fri, Feb 11, 2005 at 03:17:16PM -0500, Edmund Dengler wrote:
Greetings!
Looking at the ascii_io module, I can't seem to find a way for loading up multiple BATs where one column of the input is specified as the head value. Is there such an option, or is it always a VOID loadup?
In particular, I have a << lng >> column as the primary key. So, using my current understanding, I would need to load up a set of dummy/temp BATs, and then using these load the data to the real BATs.
Ascii_io was designed for loading wide-tables. Each column of the table is loaded into a bat[void,val]. The void heads link the columns together. So if one of the columns of this table is your primary key (lng) that is not a problem. Probably you could do without the primary (unless referenced in one of the other columns).
Niels
Regards! Ed
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi Niels!
So, if I was doing multiple loads (which would be how I would be operating, see previous posts re structuring for multiple loads), then I would definitely need to load to dummy/tmp BATs, and then append my data to the real BATs (as I will be exceeding the 1 Billion row limit using a 30-bit OID)?
The << lng >> is being used since there is no 64-bit VOID. Assuming you are using a 32 bit box your right. But on a 32bit box you will not run out of rows because of the 32 bit number, but because you run out of adresing space. Remember MonetDB is a main memory database, this means
On Fri, Feb 11, 2005 at 03:45:06PM -0500, Edmund Dengler wrote: that a bat needs to fit into memory. So a single bat should be smaller 4GB (to be precise < 2GB because of signedness issues). Your right that ascii io should append to your existing bats, but I don't see that as a problem, bat (bulk) inserts are fast. Niels
Regards! Ed
On Fri, 11 Feb 2005, Niels Nes wrote:
On Fri, Feb 11, 2005 at 03:17:16PM -0500, Edmund Dengler wrote:
Greetings!
Looking at the ascii_io module, I can't seem to find a way for loading up multiple BATs where one column of the input is specified as the head value. Is there such an option, or is it always a VOID loadup?
In particular, I have a << lng >> column as the primary key. So, using my current understanding, I would need to load up a set of dummy/temp BATs, and then using these load the data to the real BATs.
Ascii_io was designed for loading wide-tables. Each column of the table is loaded into a bat[void,val]. The void heads link the columns together. So if one of the columns of this table is your primary key (lng) that is not a problem. Probably you could do without the primary (unless referenced in one of the other columns).
Niels
Regards! Ed
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Edmund, as small question (just short for now, I'll add more details later once I have more time ...) are you using MonetDB on a 32bit or on a 64bit system? on 32bit systems, a single BATs in MonetDB cannot contain more than 2^31 rows, and it cannot be larger than 2GB (on some OS maybe 4GB), what ever is less. the reason is that BATs are basically "just" arrays (neglegting the heap for variable sized types for now), hence each BAT has to fit entirely into your address space, and that's obviously limited to 32bit on a 32bit system... actually, all BATs you're using at a time need to fit into 2GB/4GB... I'm sorry, if I had/have to disappoint you, but holding 500M rows in a single BAT of say [void,int], i.e., requiring 4 bytes per row will already hit the 2GB limit on 32bit systems. Sorry for that, but that's how MonetDB is designed... Obviously, on 64bit systems, the limit is 63/64 bit address space, which should be enough for some time ... Btw, on 64bit systems MonetDB's OID's are also 64 (63) bit... As I said, I'm quite busy right now (pre-release bug fixing, etc...), but I'll come back to your prevous questions later (probabaly only tomorrow, or on Sunday, though...) Sorry for any inconveniences that these news might have caused... Stefan
Hi Niels!
So, if I was doing multiple loads (which would be how I would be operating, see previous posts re structuring for multiple loads), then I would definitely need to load to dummy/tmp BATs, and then append my data to the real BATs (as I will be exceeding the 1 Billion row limit using a 30-bit OID)?
The << lng >> is being used since there is no 64-bit VOID.
Regards! Ed
On Fri, 11 Feb 2005, Niels Nes wrote:
On Fri, Feb 11, 2005 at 03:17:16PM -0500, Edmund Dengler wrote:
Greetings!
Looking at the ascii_io module, I can't seem to find a way for loading up multiple BATs where one column of the input is specified as the head value. Is there such an option, or is it always a VOID loadup?
In particular, I have a << lng >> column as the primary key. So, using my current understanding, I would need to load up a set of dummy/temp BATs, and then using these load the data to the real BATs.
Ascii_io was designed for loading wide-tables. Each column of the table is loaded into a bat[void,val]. The void heads link the columns together. So if one of the columns of this table is your primary key (lng) that is not a problem. Probably you could do without the primary (unless referenced in one of the other columns).
Niels
Regards! Ed
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Hi Stefan! No problem, I'm still in experimental mode so trying to find the limits/bounds etc. On Fri, 11 Feb 2005, Stefan Manegold wrote:
Edmund,
as small question (just short for now, I'll add more details later once I have more time ...)
are you using MonetDB on a 32bit or on a 64bit system?
Currently using a 32-bit Linux system as the test base.
on 32bit systems, a single BATs in MonetDB cannot contain more than 2^31 rows, and it cannot be larger than 2GB (on some OS maybe 4GB), what ever is less.
the reason is that BATs are basically "just" arrays (neglegting the heap for variable sized types for now), hence each BAT has to fit entirely into your address space, and that's obviously limited to 32bit on a 32bit system... actually, all BATs you're using at a time need to fit into 2GB/4GB...
I assume this limit holds for the heaps as well? Given this limit, then an average record BLOB size of 1kB would translate to about 2M rows per BAT. This would probably not be an issue if there could be persistent BATs within BATs. One solution would then be to partition the data into sets, and hoping the queries would not die as I run them over each partitioned set. Trying to manage this using the global name space would seem to be quite a hack. As an aside: even if I used the global namespace, and had temporary BATs to manage the collection of persistent BATs, how would the various algorithms perform? I assume I would need BAT loops to run over each sub-BAT, aggregate results, and then run a final query to get the real results. Would this just kill performance, or should be fairly close to "one large BAT" (assuming a query would run empty on most BATs and using some accelerator to hopefully determine that)?
I'm sorry, if I had/have to disappoint you, but holding 500M rows in a single BAT of say [void,int], i.e., requiring 4 bytes per row will already hit the 2GB limit on 32bit systems.
Sorry for that, but that's how MonetDB is designed...
Obviously, on 64bit systems, the limit is 63/64 bit address space, which should be enough for some time ... Btw, on 64bit systems MonetDB's OID's are also 64 (63) bit...
Unfortunately, this is not an option on most Intel hardware. 64 bit Xeons running a 64 bit Linux is out of bounds both on budget and on reliability without a lot of testing for us at the moment.
As I said, I'm quite busy right now (pre-release bug fixing, etc...), but I'll come back to your prevous questions later (probabaly only tomorrow, or on Sunday, though...)
Sorry for any inconveniences that these news might have caused...
Thanks for the info so far! As I said, I am currently evaluating whether this would meet our needs, so need to know what boundaries exist, and if there are any reasonable work-arounds. Regards! Ed
participants (3)
-
Edmund Dengler
-
Niels Nes
-
Stefan.Manegold@cwi.nl