Table column type meta-data and the MonetDB API (MAPI)
Good day, I am working on improvements to the Pentaho MonetDB Bulk Loader plugin and want to get some advice for how to properly use the MonetDB API to get the MonetDB column data type meta-data for a given table. Here is the use case: A row of data in Pentaho Data Integration is composed of fields which have a basic Java data type and other meta data kept in a helper class that keeps track of format strings, digits of precision and so on. At the current state of development a user has to know exactly which data type was used for a particular column and in which order the columns were defined by the original SQL CREATE statement. In PDI, every field (column) starts life with primitive Java data types without necessarily having a specific format defined when the variable is initialized, a user might send a TIMESTAMP (i.e. 2013-01-03 00:00:00) to MonetDB where really a DATE (i.e 2013-01-03) was expected. What I would like to do is before the bulk load actually runs, ask MonetDB through the API what the meta data is for each of the columns and output to the end user a table that shows the columns reported by MonetDB for a given table, the data type specified in the DDL, and show the user a suggested format string, which they could modify if they know what they are doing. Format strings look like "yyyy-MM-dd HH:mm:ss" and "#,###.#" etc. Generally, what API calls or approach is recommended to accomplish what I am trying to do? As a fall back, I considered querying the meta-data out like so: SELECT name AS column_name, type as column_data_type FROM "sys"."columns" WHERE table_id = ( SELECT id AS TABLE_ID FROM "sys"."tables" WHERE name = 'DIM_DATE' and schema_id = (SELECT id AS SCHEMA_ID FROM "sys"."schemas" WHERE name = 'testing')) ORDER BY number; This brings back two columns, name and type where I can match them to the outgoing fields. Then I can process the result set in the step and update the value meta-data with the appropriate format mask. Running such SQL should be unnecessary but at least I have a fall-back solution. Thanks for the advice, Brandon Jackson _______________________________________________ developers-list mailing list developers-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/developers-list
On 2013-01-03 14:38, Brandon Jackson wrote:
Good day,
I am working on improvements to the Pentaho MonetDB Bulk Loader plugin and want to get some advice for how to properly use the MonetDB API to get the MonetDB column data type meta-data for a given table.
Here is the use case:
A row of data in Pentaho Data Integration is composed of fields which have a basic Java data type and other meta data kept in a helper class that keeps track of format strings, digits of precision and so on. At the current state of development a user has to know exactly which data type was used for a particular column and in which order the columns were defined by the original SQL CREATE statement. In PDI, every field (column) starts life with primitive Java data types without necessarily having a specific format defined when the variable is initialized, a user might send a TIMESTAMP (i.e. 2013-01-03 00:00:00) to MonetDB where really a DATE (i.e 2013-01-03) was expected.
What I would like to do is before the bulk load actually runs, ask MonetDB through the API what the meta data is for each of the columns and output to the end user a table that shows the columns reported by MonetDB for a given table, the data type specified in the DDL, and show the user a suggested format string, which they could modify if they know what they are doing. Format strings look like "yyyy-MM-dd HH:mm:ss" and "#,###.#" etc.
Generally, what API calls or approach is recommended to accomplish what I am trying to do?
As a fall back, I considered querying the meta-data out like so:
SELECT name AS column_name, type as column_data_type FROM "sys"."columns" WHERE table_id = ( SELECT id AS TABLE_ID FROM "sys"."tables" WHERE name = 'DIM_DATE' and schema_id = (SELECT id AS SCHEMA_ID FROM "sys"."schemas" WHERE name = 'testing')) ORDER BY number;
This brings back two columns, name and type where I can match them to the outgoing fields. Then I can process the result set in the step and update the value meta-data with the appropriate format mask. Running such SQL should be unnecessary but at least I have a fall-back solution.
Thanks for the advice,
Brandon Jackson
You could study how ODBC and JDBC do this. For ODBC, look particularly at ODBCInitResult(). One bit of magic that happens is that ODBC calls mapi_set_size_header(mid, 1); early on (see SQLConnect_()) to request an extra bit of information for each result set. Using this extra information, ODBC can figure out what the declared size is of char and varchar columns. As to the format string, I guess once you know what the SQL type is, you can just use fixed formats for each type. If you want to use pure SQL calls to get information about existing tables, you can study what msqldump does in the file clients/mapiclient/dump.c. It uses SQL queries to the system tables to find out the exact types of the tables that are to be dumped. But note that these queries tend to be quite large. -- Sjoerd Mullender _______________________________________________ developers-list mailing list developers-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/developers-list
participants (2)
-
Brandon Jackson
-
Sjoerd Mullender