Bulk Load, Same Table, Sequential COPY INTO, MAPI
Dear developers, In the last two weeks we have made a lot of progress on the MonetDB Streaming Bulk Loader Step in the Pentaho Data Integration. Yesterday, I tried to put it into use similar to a larger scale production use and started getting some errors along the lines of: "COMMIT: transaction is aborted because of concurency conflicts, will ROLLBACK instead" Not sure when these clear up. In this case after previous errors like this, I established a new connection and wanted to simply create a table. sql>CREATE TABLE "testing"."fact_cen_hist" ( "CENSUS_SK" INT , "FACILITY_SK" INT , "RESIDENT_SK" INT , "RESIDENT_LOCATION_SK" INT , "DISCHARGED_TO" CLOB , "PAYOR" VARCHAR(20) , "PAYOR_CATEGORY" VARCHAR(20) , "CENSUS_STATUS" VARCHAR(50) , "CENSUS_EVENT" VARCHAR(50) , "IS_ACTIVE" INT , "IS_MEDICAID_DAY" INT , "IS_MEDICARE_DAY" INT , "IS_HOSPICE_DAY" INT , "IS_COMMERCIAL_INSURANCE_DAY" INT , "IS_PRIVATE_PAY_DAY" INT , "IS_MANAGED_CARE_DAY" INT , "IS_MEDICARE_SECOND_PAYOR" INT , "IS_MEDICAID_PENDING_DAY" INT , "DATE_OF_STAY" INT , "NEXT_EVENT_DATE" INT , "DATE_BEFORE_NEXT_EVENT" INT , "CENSUS_EVENT_DATE" INT , "IS_MEMORY_CARE_DAY" INT , "IS_AL_DAY" INT , "IS_SNF_DAY" INT , "IS_DISCHARGED" INT , "IS_RTA" INT , "ROOM_KEY" VARCHAR(5) , "IS_BED_HOLD" INT , "BILLABLE_DAY" INT , "HEAD_IN_BED_DAY" INT , "FINANCIAL_DAY" INT , "IS_ADMISSION" INT , "IS_READMISSION" INT , "IS_EXPIRED" INT , "PAYOR_SK" INT , "IS_SUBMISSION" INT , "SPLIT_CENSUS" INT ) ; operation successful (2.7s) COMMIT: transaction is aborted because of concurency conflicts, will ROLLBACK instead sql> ---- Generally speaking, our bulk loader reads in a chunk of rows into memory say 10k, 20k or even 100k or 1 Million rows, then it will generate a COPY INTO, know how many rows are outbound; stream the rows through a BufferedWriter with all the new lines, and reading from BufferedReader and waiting for prompts etc. Then PDI will fetch another chunk of rows and repeat the process. I noticed that when using the MAPI API and trying to append a COMMIT in the output brought back an error saying that COMMIT is not allowed in auto commit mode, so it looks like after each big COPY INTO statement sends its complete pack of rows, with the new line stuff, should immediately try to COMMIT. What I want to figure out is why after the first COPY INTO of however many rows, 10k for example, would I get a message about failure to commit because of concurrency, ROLLBACK instead. Once that happens, is MonetDB in a funk? Do I need to shut it down and restart? What can I do to provide more helpful information to understand the nature of the problem? I read on: http://www.monetdb.org/Documentation/Cookbooks/SQLrecipies/LoadingBulkData That I could supply a guess much larger on the first COPY INTO so MonetDB can prepare itself for a huge influx of rows, despite getting a smaller batch. Example in my mind is know that my table has roughly 6 million rows, so let 6 million be the guess, then have really accurate calls where COPY INTO is told 20k rows are on the way in each subsequent call leading up to the 6 million that I guessed would be necessary. It worries me that this appears to be brittle. We might produce 10 million rows in a source table and want to stream it over. At current, it looks like COMMIT concurrency errors would come up as we sequentially do COPY INTO's building up to finally having the bulk load complete. Any ideas or feedback is very much appreciated. I can carefully provide code, more detail and follow any procedures necessary to help diagnose the issue. Sample data is also available. Thanks ahead of time, Brandon Jackson _______________________________________________ developers-list mailing list developers-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/developers-list
participants (1)
-
Brandon Jackson