[Monetdb-developers] encoding of strings returned by Python client
Any chance the Python client could be modified to return strings in Unicode instead of UTF-8? Seems like a better fit for Python; for example, that's how Python's sqlite3 module behaves. An alternative approach is to expose a set_encoding method on the connection that let's you specify the encoding of all text fields returned. If backward compatibility is an issue, the latter idea seems better even though it adds a knob. Any comments? (Is the maintainer of the Python adapter active even on this list? I didn't see many Python-specific posts in the archives.) Thanks, m
On 12/21/2009 03:35 AM, Mark Bucciarelli wrote:
Any chance the Python client could be modified to return strings in Unicode instead of UTF-8?
Yes it can.
Seems like a better fit for Python; for example, that's how Python's sqlite3 module behaves.
An alternative approach is to expose a set_encoding method on the connection that let's you specify the encoding of all text fields returned.
If backward compatibility is an issue, the latter idea seems better even though it adds a knob.
Any comments?
I will look into this and discuss.
(Is the maintainer of the Python adapter active even on this list? I didn't see many Python-specific posts in the archives.)
hi! - gijs
On 12/21/2009 01:53 PM, Gijs Molenaar wrote:
On 12/21/2009 03:35 AM, Mark Bucciarelli wrote:
Any chance the Python client could be modified to return strings in Unicode instead of UTF-8?
I can implement this, and also make it backwards compatible configurable. Does anyone else have an opinion about this? - gijs
On Mon, Dec 21, 2009 at 01:53:52PM +0100, Gijs Molenaar wrote:
On 12/21/2009 03:35 AM, Mark Bucciarelli wrote:
Is the maintainer of the Python adapter active even on this list?
hi!
:) Hi! I see you committed my fix for the new-line issue. Great! I was surprised that the mapi backend behaved that way--was this really a Python client issue? I hit another issue (this time for sure with the Python client) that I didn't open a tracker issue for--patch enclosed below. Pretty simple: DOUBLE's and REAL's map to floats, not ints. I was tempted to add some unit tests for this basic type conversion stuff, but I couldn't understand how the tests are run, and I wasn't going to dig around the 5,000 lines of auto-tools code: $ pwd /home/mark/src/monetdb/clients/src/python $ find . -name "Makefile*" -exec cat {} \; | wc -l 5499 $ find . -name "*.py" -exec cat {} \; | wc -l 5545 When the Unicode fix is committed, I think the initial (rudimentary but functional) release of a Django adapter for MonetDB will be done. http://github.com/mbucc/monetdb-python Is there an announcement list I should post to when I'm happy with the Django adapter? Thanks, m To apply this patch, from root of Monetdb source tree, execute the following commands: $ cd clients/src/python/monetdb/sql $ patch -p0 < 2.monetdb.sql.converters.diff --- /home/mark/converters.py.orig 2009-12-20 00:47:58.000000000 -0500 +++ ./converters.py 2009-12-20 00:48:16.000000000 -0500 @@ -42,8 +42,8 @@ type_codes.WRD: int, type_codes.BIGINT: int, type_codes.SERIAL: int, - type_codes.REAL: int, - type_codes.DOUBLE: int, + type_codes.REAL: float, + type_codes.DOUBLE: float, type_codes.BOOLEAN: self.__bool, type_codes.DATE: self.__date, type_codes.TIME: self.__time,
On 21-12-2009 18:41, Mark Bucciarelli wrote:
On Mon, Dec 21, 2009 at 01:53:52PM +0100, Gijs Molenaar wrote:
On 12/21/2009 03:35 AM, Mark Bucciarelli wrote:
Is the maintainer of the Python adapter active even on this list?
hi!
:)
Hi!
I see you committed my fix for the new-line issue. Great!
I was surprised that the mapi backend behaved that way--was this really a Python client issue?
I think it was, since escaping was not done properly. Please let me know if the new version is working for you.
I hit another issue (this time for sure with the Python client) that I didn't open a tracker issue for--patch enclosed below.
Pretty simple: DOUBLE's and REAL's map to floats, not ints.
ah nice. Will have a look tomorrow.
I was tempted to add some unit tests for this basic type conversion stuff, but I couldn't understand how the tests are run, and I wasn't going to dig around the 5,000 lines of auto-tools code:
$ pwd /home/mark/src/monetdb/clients/src/python $ find . -name "Makefile*" -exec cat {} \; | wc -l 5499 $ find . -name "*.py" -exec cat {} \; | wc -l 5545
The tests are in clients/src/python/test and inherit some basic python DB tests. I think they would belong in capabilities_monetdb.py.
When the Unicode fix is committed, I think the initial (rudimentary but functional) release of a Django adapter for MonetDB will be done.
http://github.com/mbucc/monetdb-python
Is there an announcement list I should post to when I'm happy with the Django adapter?
Really nice work. I was looking into this, but I thought it would be too much work ;). There was an adapter laying around on the web somewhere for an old version.
Thanks,
m
To apply this patch, from root of Monetdb source tree, execute the following commands:
$ cd clients/src/python/monetdb/sql $ patch -p0 < 2.monetdb.sql.converters.diff
--- /home/mark/converters.py.orig 2009-12-20 00:47:58.000000000 -0500 +++ ./converters.py 2009-12-20 00:48:16.000000000 -0500 @@ -42,8 +42,8 @@ type_codes.WRD: int, type_codes.BIGINT: int, type_codes.SERIAL: int, - type_codes.REAL: int, - type_codes.DOUBLE: int, + type_codes.REAL: float, + type_codes.DOUBLE: float, type_codes.BOOLEAN: self.__bool, type_codes.DATE: self.__date, type_codes.TIME: self.__time,
Hi Mark, thaks for all your patches! I guess Gijs will tkae care of also checking in this one. You are more than welcome to announce your Django adapter via monetdb-announce@lists.sourceforge.net and/or monetdb-users@lists.sourceforge.net. In case you message doesn't go through instantly (we keep posting access to our mailing list rather limited to prevent spam), we'll take care of approving (or resending) it. To run MonetDB tests, you also need to download/checkout, compile and install the "testing" package of MonetDB. This provides our "Mtest.py" testing tool. To run the python tests in .../clients/src/python/test/ you need to also have the sql sources of MonetDB. Go to .../sql/src/test/mapi/Tests/ and run Mtest.py . python_test_monetdb_sql_dbapi20 python_test_monetdb_sql_capabilities Stefan On Mon, Dec 21, 2009 at 12:41:52PM -0500, Mark Bucciarelli wrote:
On Mon, Dec 21, 2009 at 01:53:52PM +0100, Gijs Molenaar wrote:
On 12/21/2009 03:35 AM, Mark Bucciarelli wrote:
Is the maintainer of the Python adapter active even on this list?
hi!
:)
Hi!
I see you committed my fix for the new-line issue. Great!
I was surprised that the mapi backend behaved that way--was this really a Python client issue?
I hit another issue (this time for sure with the Python client) that I didn't open a tracker issue for--patch enclosed below.
Pretty simple: DOUBLE's and REAL's map to floats, not ints.
I was tempted to add some unit tests for this basic type conversion stuff, but I couldn't understand how the tests are run, and I wasn't going to dig around the 5,000 lines of auto-tools code:
$ pwd /home/mark/src/monetdb/clients/src/python $ find . -name "Makefile*" -exec cat {} \; | wc -l 5499 $ find . -name "*.py" -exec cat {} \; | wc -l 5545
When the Unicode fix is committed, I think the initial (rudimentary but functional) release of a Django adapter for MonetDB will be done.
http://github.com/mbucc/monetdb-python
Is there an announcement list I should post to when I'm happy with the Django adapter?
Thanks,
m
To apply this patch, from root of Monetdb source tree, execute the following commands:
$ cd clients/src/python/monetdb/sql $ patch -p0 < 2.monetdb.sql.converters.diff
--- /home/mark/converters.py.orig 2009-12-20 00:47:58.000000000 -0500 +++ ./converters.py 2009-12-20 00:48:16.000000000 -0500 @@ -42,8 +42,8 @@ type_codes.WRD: int, type_codes.BIGINT: int, type_codes.SERIAL: int, - type_codes.REAL: int, - type_codes.DOUBLE: int, + type_codes.REAL: float, + type_codes.DOUBLE: float, type_codes.BOOLEAN: self.__bool, type_codes.DATE: self.__date, type_codes.TIME: self.__time,
------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
On 21 dec 2009, at 19:32, Stefan Manegold wrote:
Hi Mark,
thaks for all your patches!
I guess Gijs will tkae care of also checking in this one.
You are more than welcome to announce your Django adapter via monetdb-announce@lists.sourceforge.net and/or monetdb-users@lists.sourceforge.net.
In case you message doesn't go through instantly (we keep posting access to our mailing list rather limited to prevent spam), we'll take care of approving (or resending) it.
To run MonetDB tests, you also need to download/checkout, compile and install the "testing" package of MonetDB. This provides our "Mtest.py" testing tool.
To run the python tests in .../clients/src/python/test/ you need to also have the sql sources of MonetDB. Go to .../sql/src/test/mapi/Tests/ and run Mtest.py . python_test_monetdb_sql_dbapi20 python_test_monetdb_sql_capabilities
This is the MonetDB way of running the tests, you can also run the (unittest) tests directly with: * python capabilities_monetdb.py * python dbapi20_monetdb.py from clients/src/python/test. This way you don't need the sql and monetdb sources. You can change the database port by setting the MAPIPORT environment variable, and the database by changing TSTDB. Username and password should be monetdb. - gijs
On 21 dec 2009, at 13:53, Gijs Molenaar wrote:
On 12/21/2009 03:35 AM, Mark Bucciarelli wrote:
Any chance the Python client could be modified to return strings in Unicode instead of UTF-8?
Yes it can.
Seems like a better fit for Python; for example, that's how Python's sqlite3 module behaves.
Do you know the behavior of other db API's? For example the native MySQL Python API? How big is the problem to do a manual conversion to unicode in your wrapper? The problem is that there isn't a real standard way of doing this, at least not defined by the python db API 2.0. - gijs
On Tue, Dec 22, 2009 at 5:30 AM, Gijs Molenaar
Do you know the behavior of other db API's?
MSQLdb has a use_encoding keyword. The PostgreSql Django adapter had a set_client_encoding() method on the connection object. The native SQLite3 Python module just returns Unicode.
How big is the problem to do a manual conversion to unicode in your wrapper?
Painful--I couldn't find a good place to hook in the code.
The problem is that there isn't a real standard way of doing this, at least not defined by the python db API 2.0.
Yup, the PEP is encoding agnostic. I though the path you started down was a good idea; pass a use_unicode keyword to the connection __init__(). Then if use_unicode = True, just decode("utf8") the string in __strip(), otherwise leave code as is. m
Hi Mark, Just committed the REAL/DOUBLE fix, this encoding thingy has to wait until January, sorry. On 22 dec 2009, at 13:43, Mark Bucciarelli wrote:
On Tue, Dec 22, 2009 at 5:30 AM, Gijs Molenaar
wrote: Do you know the behavior of other db API's?
MSQLdb has a use_encoding keyword.
The PostgreSql Django adapter had a set_client_encoding() method on the connection object.
The native SQLite3 Python module just returns Unicode.
How big is the problem to do a manual conversion to unicode in your wrapper?
Painful--I couldn't find a good place to hook in the code.
The problem is that there isn't a real standard way of doing this, at least not defined by the python db API 2.0.
Yup, the PEP is encoding agnostic.
I though the path you started down was a good idea; pass a use_unicode keyword to the connection __init__().
Then if use_unicode = True, just decode("utf8") the string in __strip(), otherwise leave code as is.
m
On Tue, Dec 22, 2009 at 9:58 AM, Gijs Molenaar
Just committed the REAL/DOUBLE fix, this encoding thingy has to wait until January, sorry.
One thing I noticed is that the mapi backend has some magic to handle both Unicode and UTF-8 strings when passed in. Enlcosed below is the python script I used to exercise the encoding machinery. Thanks! m import monetdb.sql from monetdb.monetdb_exceptions import OperationalError con = monetdb.sql.connect(username="test", password="test", database="test") c = con.cursor() try: c.execute('drop table test') c.execute('COMMIT') except OperationalError, e: c.execute('ROLLBACK') print "drop table test failed:", e pass c.execute('create table test (id int, s varchar(30))') c.execute('COMMIT') # # cafe is in unicode here. # cafe = u"caf" + unichr(0x00E9) print 'cafe.__repr__() =', cafe.__repr__() print 'cafe.encode("utf8").__repr() =', cafe.encode("utf8").__repr__() c.execute('insert into test values (%s,%s)', (1,cafe)) #c.execute('insert into test values (%s,%s)', (1,cafe.encode("utf8"))) c.execute('COMMIT') c.execute("select s from test where id = %d" % (1,)) row = c.fetchone() print row
Hello Mark, I added a use_unicode flag to the Connection class. This controls the return of Unicode or not for strings with Python 2.x. This is in the HEAD CVS MonetDB tree. Can you please test if this is the expected behavior? Thanks for your good work, - gijs
On Tue, Jan 5, 2010 at 12:35 PM, Gijs Molenaar
I added a use_unicode flag to the Connection class. This controls the return of Unicode or not for strings with Python 2.x. This is in the HEAD CVS MonetDB tree.
Can you please test if this is the expected behavior?
Gijs, This works as advertised; my utf-8/unicode unit test passes. See: http://github.com/mbucc/monetdb-python/commit/13c4d694cd940b34728ec84787f400... Thanks, Mark
participants (3)
-
Gijs Molenaar
-
Mark Bucciarelli
-
Stefan Manegold