[MonetDB-users] (no subject)
Hi All - I am trying to evaluate MonetDB on Windows by loading up some large data sets. I'm not having much luck and I'm hoping someone can help me out. The first set of problems revolve around the ODBC driver. It seems to not be able to report errors or even return the results of queries reliably. Most errors come back as a generic error message although we occasionally run into strange messages involving UTF-16 conversions when no string columns are involved (e.g. a count distinct of a numeric column). We are using the July 2008 version from the web site. Are these known issues and is there a more recent build we could try? This is not in production yet, we are simply trying to evaluate the system. I have managed to work around the ODBC problems very slowly and have got to a point where I can load small data sets. But when I then try to do a large insert using the COPY FROM syntax, my client got a "connection terminated" error. When I take the COPY FROM statement and paste it into the client, the MonetDB server crashes. The only log files I can find have a few (mostly unprintable) characters in them. Restarting the server and trying again produces the same problem. The insert query is COPY 6568253 RECORDS INTO "Airlines"."_2005_On_Time_Performance" FROM 'C:\\DOCUME~1\\hawkfish\\LOCALS~1\\Temp\\1\\7A.tmp\\table.tbl' USING DELIMITERS '\t', '\n', '\''; Is there some limit on the COPY INTO syntax? Is there some log file somewhere that could point me in the right direction? TIA, ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
On 2008-10-13 22:23, Richard Wesley wrote:
Hi All -
I am trying to evaluate MonetDB on Windows by loading up some large data sets. I'm not having much luck and I'm hoping someone can help me out.
The first set of problems revolve around the ODBC driver. It seems to not be able to report errors or even return the results of queries reliably. Most errors come back as a generic error message although we occasionally run into strange messages involving UTF-16 conversions when no string columns are involved (e.g. a count distinct of a numeric column). We are using the July 2008 version from the web site. Are these known issues and is there a more recent build we could try? This is not in production yet, we are simply trying to evaluate the system.
I have managed to work around the ODBC problems very slowly and have got to a point where I can load small data sets. But when I then try to do a large insert using the COPY FROM syntax, my client got a "connection terminated" error. When I take the COPY FROM statement and paste it into the client, the MonetDB server crashes. The only log files I can find have a few (mostly unprintable) characters in them. Restarting the server and trying again produces the same problem. The insert query is
COPY 6568253 RECORDS INTO "Airlines"."_2005_On_Time_Performance" FROM 'C:\\DOCUME~1\\hawkfish\\LOCALS~1\\Temp\\1\\7A.tmp\\table.tbl' USING DELIMITERS '\t', '\n', '\'';
Is there some limit on the COPY INTO syntax? Is there some log file somewhere that could point me in the right direction?
The log you found is probably the write-ahead log of the database server which is used to guarantee transactions. The ODBC driver can produce a log if you start the process that loads the driver with an environment variable ODBCDEBUG with as value the file you would like to use to collect the log (i.e. something like set ODBCDEBUG=c:\odbc.log ) What kind of system are you using? 32 bit or 64 bit? XP? Vista? If a 32 bit system, remember that all data has to be addressable (i.e. less than 2GB), and this includes all temporary tables that are used while loading. (The database can be bigger, but all data that is referred to (used) in a query has to be addressable at the same time.) Conversions happen because the interface to the server uses UTF-8 and often the interface from the application to the driver (actually, the interface between the driver manager and the driver) uses wide characters (i.e. UTF-16). But I would be interested in fixing any bugs in that area, so if you can capture a log session where it goes wrong, I'd be grateful. -- Sjoerd Mullender
Hi Sjoerd - Thanks for responding so quickly. On 13 Oct 2008, at 13:48, Sjoerd Mullender wrote:
The ODBC driver can produce a log if you start the process that loads the driver with an environment variable ODBCDEBUG with as value the file you would like to use to collect the log (i.e. something like set ODBCDEBUG=c:\odbc.log )
Thanks, I'll try that. Is there any way to set it without an environment variable (e.g. an ODBC manager setting or a connect string attribute?) That would allow me to separate logging MonetDB data from the logs generated by the data source used for import. Although we have found that we can't seem to load a different ODBC driver at the same time.
What kind of system are you using? 32 bit or 64 bit? XP? Vista?
32 bit XP, but we want to be able to work on all of these.
If a 32 bit system, remember that all data has to be addressable (i.e. less than 2GB), and this includes all temporary tables that are used while loading. (The database can be bigger, but all data that is referred to (used) in a query has to be addressable at the same time.)
So this places a limit on the size of a database that can be accessed on a 32 bit machine? Or is it simply an insert limitation? Let me back up a bit. We are investigating using MonetDB for OLAP in a desktop product. We have no restrictions (currently) on the size of the database the user can analyse. Most users are running 32 bit XP. Can MonetDB handle databases larger than 2Gb on such a system?
Conversions happen because the interface to the server uses UTF-8 and often the interface from the application to the driver (actually, the interface between the driver manager and the driver) uses wide characters (i.e. UTF-16). But I would be interested in fixing any bugs in that area, so if you can capture a log session where it goes wrong, I'd be grateful.
I'll try to get some output for you. ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
On 2008-10-13 23:04, Richard Wesley wrote:
Hi Sjoerd -
Thanks for responding so quickly.
On 13 Oct 2008, at 13:48, Sjoerd Mullender wrote:
The ODBC driver can produce a log if you start the process that loads the driver with an environment variable ODBCDEBUG with as value the file you would like to use to collect the log (i.e. something like set ODBCDEBUG=c:\odbc.log )
Thanks, I'll try that. Is there any way to set it without an environment variable (e.g. an ODBC manager setting or a connect string attribute?) That would allow me to separate logging MonetDB data from the logs generated by the data source used for import. Although we have found that we can't seem to load a different ODBC driver at the same time.
That would be a nice feature, but it's not there at the moment. So, it has to be an environment variable for now.
What kind of system are you using? 32 bit or 64 bit? XP? Vista?
32 bit XP, but we want to be able to work on all of these.
If a 32 bit system, remember that all data has to be addressable (i.e. less than 2GB), and this includes all temporary tables that are used while loading. (The database can be bigger, but all data that is referred to (used) in a query has to be addressable at the same time.)
So this places a limit on the size of a database that can be accessed on a 32 bit machine? Or is it simply an insert limitation?
You can have lots of tables that amount to way more than 2 GB, but any single table (together with any temporary tables needed during querying) has to fit into the 32-bit address space (in practice 31 bits, i.e. 2GB).
Let me back up a bit. We are investigating using MonetDB for OLAP in a desktop product. We have no restrictions (currently) on the size of the database the user can analyse. Most users are running 32 bit XP. Can MonetDB handle databases larger than 2Gb on such a system?
Conversions happen because the interface to the server uses UTF-8 and often the interface from the application to the driver (actually, the interface between the driver manager and the driver) uses wide characters (i.e. UTF-16). But I would be interested in fixing any bugs in that area, so if you can capture a log session where it goes wrong, I'd be grateful.
I'll try to get some output for you. ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Sjoerd Mullender
Hey Sjoerd - On 13 Oct 2008, at 14:19, Sjoerd Mullender wrote:
Conversions happen because the interface to the server uses UTF-8 and often the interface from the application to the driver (actually, the interface between the driver manager and the driver) uses wide characters (i.e. UTF-16). But I would be interested in fixing any bugs in that area, so if you can capture a log session where it goes wrong, I'd be grateful.
I'll try to get some output for you.
How is this? ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
Richard Wesley wrote:
Hey Sjoerd -
On 13 Oct 2008, at 14:19, Sjoerd Mullender wrote:
Conversions happen because the interface to the server uses UTF-8 and often the interface from the application to the driver (actually, the interface between the driver manager and the driver) uses wide characters (i.e. UTF-16). But I would be interested in fixing any bugs in that area, so if you can capture a log session where it goes wrong, I'd be grateful.
I'll try to get some output for you.
How is this?
Beautiful. Your log forced me to take another close look at the code, and I think I found a bug that could explain the UTF-16 problem. Basically it is a matter of looking beyond the end of the string being converted to UTF-16 (internally the driver uses UTF-8, but since your application uses wide-character calls, result strings need to be converted). I think I nailed this bug, so I will commit this to our repository shortly. As to your question about6 the generic error, the driver often returns a generic error because a query was not acceptable to the server and the driver doesn't interpret the error messages that come back from the server (they are too varied and unstructured). So the best the driver can do is give a generic error. -- Sjoerd Mullender
On 16 Oct 2008, at 07:55, Sjoerd Mullender wrote:
Your log forced me to take another close look at the code, and I think I found a bug that could explain the UTF-16 problem. Basically it is a matter of looking beyond the end of the string being converted to UTF-16 (internally the driver uses UTF-8, but since your application uses wide-character calls, result strings need to be converted).
I think I nailed this bug, so I will commit this to our repository shortly.
Great! Is there a place where I can pick up nightly builds of the ODBC driver? ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
On 16 Oct 2008, at 07:55, Sjoerd Mullender wrote:
As to your question about6 the generic error, the driver often returns a generic error because a query was not acceptable to the server and the driver doesn't interpret the error messages that come back from the server (they are too varied and unstructured). So the best the driver can do is give a generic error.
I'm getting another strange problem where the ODBC driver issues a command and it fails, but if I then go and issue the same command in the client, it succeeds. I have attached another log file from a run that has this problem. The failure point is not consistent and changes between runs (the tables are dropped before each attempt to build the data). ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
On 2008-10-17 20:06, Richard Wesley wrote:
On 16 Oct 2008, at 07:55, Sjoerd Mullender wrote:
As to your question about6 the generic error, the driver often returns a generic error because a query was not acceptable to the server and the driver doesn't interpret the error messages that come back from the server (they are too varied and unstructured). So the best the driver can do is give a generic error.
I'm getting another strange problem where the ODBC driver issues a command and it fails, but if I then go and issue the same command in the client, it succeeds. I have attached another log file from a run that has this problem. The failure point is not consistent and changes between runs (the tables are dropped before each attempt to build the data).
I see a failure after a COPY 19348821 OFFSET 58046463 RECORDS INTO ... query. Could it be that either the server or the client is using too much memory? It could be that the server uses too much memory and therefore complains, but by the time you try it again, the server has maybe consolidated some tables and can again cope with the next batch. So one thing to check for is the memory footprint of the server. Niels, what do you think? The table consists of 7 integers and 2 float(53) and the application inserts batches of nearly 20M rows. -- Sjoerd Mullender
On Fri, Oct 17, 2008 at 10:01:26PM +0200, Sjoerd Mullender wrote:
On 2008-10-17 20:06, Richard Wesley wrote:
On 16 Oct 2008, at 07:55, Sjoerd Mullender wrote:
As to your question about6 the generic error, the driver often returns a generic error because a query was not acceptable to the server and the driver doesn't interpret the error messages that come back from the server (they are too varied and unstructured). So the best the driver can do is give a generic error.
I'm getting another strange problem where the ODBC driver issues a command and it fails, but if I then go and issue the same command in the client, it succeeds. I have attached another log file from a run that has this problem. The failure point is not consistent and changes between runs (the tables are dropped before each attempt to build the data).
I see a failure after a COPY 19348821 OFFSET 58046463 RECORDS INTO ... query. Could it be that either the server or the client is using too much memory? It could be that the server uses too much memory and therefore complains, but by the time you try it again, the server has maybe consolidated some tables and can again cope with the next batch.
So one thing to check for is the memory footprint of the server.
Niels, what do you think? The table consists of 7 integers and 2 float(53) and the application inserts batches of nearly 20M rows.
Probably its indeed some form of out of memory which causes the problem. Niels
-- Sjoerd Mullender
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On 17 Oct 2008, at 13:46, Niels Nes wrote:
I see a failure after a COPY 19348821 OFFSET 58046463 RECORDS INTO ... query. Could it be that either the server or the client is using too much memory? It could be that the server uses too much memory and therefore complains, but by the time you try it again, the server has maybe consolidated some tables and can again cope with the next batch.
So one thing to check for is the memory footprint of the server.
Niels, what do you think? The table consists of 7 integers and 2 float(53) and the application inserts batches of nearly 20M rows.
Probably its indeed some form of out of memory which causes the problem.
I could cut the batch size in half (I do it based on the file size.) Is the size limitation based on the input text or the built data? ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
Richard Wesley wrote:
On 17 Oct 2008, at 13:46, Niels Nes wrote:
I see a failure after a COPY 19348821 OFFSET 58046463 RECORDS INTO ... query. Could it be that either the server or the client is using too much memory? It could be that the server uses too much memory and therefore complains, but by the time you try it again, the server has maybe consolidated some tables and can again cope with the next batch.
So one thing to check for is the memory footprint of the server.
Niels, what do you think? The table consists of 7 integers and 2 float(53) and the application inserts batches of nearly 20M rows.
Probably its indeed some form of out of memory which causes the problem.
I could cut the batch size in half (I do it based on the file size.)
Is the size limitation based on the input text or the built data?
The batch size should not be the problem. It is the final table size. Are you running on a 32bits machine?
________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 17 Oct 2008, at 16:48, Martin Kersten wrote:
The batch size should not be the problem. It is the final table size. Are you running on a 32bits machine?
Yes. So the final table size has to fit in memory? Even during inserts? ________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
On Mon, Oct 20, 2008 at 09:20:25AM -0700, Richard Wesley wrote:
On 17 Oct 2008, at 16:48, Martin Kersten wrote:
The batch size should not be the problem. It is the final table size. Are you running on a 32bits machine?
Yes. So the final table size has to fit in memory? Even during inserts?
During inserts (even only batches) all columns of the table are accessed. As MonetDB's memory maps columns it requires the adres space needed for all the data. So if your data exceeds the 32bit address space this will fail. Niels
________________________________________________________ Richard Wesley Senior Software Developer Tableau Software Visit: http://www.trytableau.com/now.html
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (4)
-
Martin Kersten
-
Niels Nes
-
Richard Wesley
-
Sjoerd Mullender