Every SQL-based application understands INSERT INTO. For moderate amounts of data its performance is sufficient as long as you take care to avoid some pitfalls. In a nutshell,
Make sure to disable autocommit mode or wrap the INSERT statements in a transaction
When inserting programmatically, see if you can use prepared statements.
Drop all constraints on the table and restore them after the INSERTs.
Set SET sys.optimizer = 'minimal_pipe'
The easiest and most portable way to insert data into a table is to use the INSERT INTO SQL statement.
INSERT INTO my_table VALUES (1, 2, 3);
If table my_table
has three columns, this statement inserts a new row containing
the given values. If the table has more columns, the additional columns are set to
NULL or a default value.
It is possible to insert multiple rows using a single statement:
INSERT INTO my_table VALUES (1, 2, 3), (4, 5, 6);
This adds two rows. It is not recommended to have more than at most a few dozen rows per INSERT statement as the parsing overhead will quickly become a bottleneck.
By default, MonetDB runs in autocommit mode which means that every statement
executes in a separate transaction. This adds considerable overhead if the
amount of work per statement is very small, such as with an INSERT statement.
Wrapping the sequence of inserts between a START TRANSACTION
statement and a
COMMIT
makes it a single transaction so the overhead is spread out over all
the inserts.
As an alternative to START TRANSACTION you can also disable autocommit mode at
the connection level. With mclient, this is done using the -a
flag.
When using JDBC (Java), use
connection.setAutoCommit(false);
.
With ODBC,
use the function SQLSetConnectAttr()
.
With pymonetd
autocommit is already disabled by default.
If your language bindings support it you can use
prepared statements
to reduce the per-insert parsing and verification overhead. A prepared statement allows
you to provide the INSERT statement once, with question marks ?
used as placeholders
for the values to insert. After preparing, you then repeatedly bind the placeholders to
new values and execute the prepared statement.
Some language bindings provide special support for prepared statements. For example, in Java:
connection.setAutoCommit(false);
PreparedStatement stmt = connection.prepareStatement("INSERT INTO table VALUES (?, ?, ?)");
// repeatedly:
stmt.setInt(1, /* a value */);
stmt.setInt(2, /* a value */);
stmt.setInt(3, /* a value */);
stmt.executeUpdate();
// finally:
connection.commit();
With ODBC, using prepared statements involves a call SQLPrepare(hdl, "INSERT INTO table VALUES (?, ?, ?)", SQL_NTS)
followed by repeated calls to,
SQLBindParameter()
and SQLExecute()
.
Table constraints such as PRIMARY KEY, UNIQUE and FOREIGN KEY are verified on every INSERT. This becomes increasingly time consuming.
Because checking constraints in bulk is much more efficient than repeatedly checking them for a single value, it is often more efficient to simply drop the constraints before the insertions using ALTER TABLE ... DROP CONSTRAINT, then recreate them after using ALTER TABLE ... ADD CONSTRAINT.
Before execution, every SQL statement is translated into an internal language called MAL. After translation and before execution the system analyzes the MAL plan and tries to make improvements to it. This is very effective for SELECT queries, especially complex ones, but there is not much to be optimized for INSERT plans.
If you're executing many INSERTs, disabling the MAL optimizer can make a large difference. To do so, execute
SET sys.optimizer = 'minimal_pipe';
This setting applies to the current connection only, so after completing the INSERTs, close the connection or execute
SET sys.optimizer = 'default_pipe';
A small unscientific benchmark loading a table with 45529 rows and 109 columns, mostly varchar and decimal:
EXPERIMENT | AVERAGE | SPEEDUP |
---|---|---|
INSERT without transaction | 413.00s | 0.10× |
INSERT default_pipe | 43.16s | 1.00× |
PREPARE INSERT default_pipe | 24.35s | 1.77× |
INSERT minimal_pipe | 22.27s | 1.94× |
PREPARE INSERT minimal_pipe | 14.86s | 2.91× |
COPY INTO | 0.73s | 59.4× |
The final row is for comparison with the next data loading method we'll describe, loading CSV data using COPY INTO.