Mercurial > hg > monetdb-java
annotate src/main/java/org/monetdb/mcl/parser/TupleLineParser.java @ 937:d416e9b6b3d0
Update Copyright year.
author | Martin van Dinther <martin.van.dinther@monetdbsolutions.com> |
---|---|
date | Thu, 02 Jan 2025 13:27:58 +0100 (4 months ago) |
parents | e890195256ac |
children |
rev | line source |
---|---|
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
1 /* |
833
e890195256ac
Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents:
814
diff
changeset
|
2 * SPDX-License-Identifier: MPL-2.0 |
e890195256ac
Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents:
814
diff
changeset
|
3 * |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
4 * This Source Code Form is subject to the terms of the Mozilla Public |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
5 * License, v. 2.0. If a copy of the MPL was not distributed with this |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
6 * file, You can obtain one at http://mozilla.org/MPL/2.0/. |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
7 * |
937
d416e9b6b3d0
Update Copyright year.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
833
diff
changeset
|
8 * Copyright 2024, 2025 MonetDB Foundation; |
833
e890195256ac
Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents:
814
diff
changeset
|
9 * Copyright August 2008 - 2023 MonetDB B.V.; |
e890195256ac
Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents:
814
diff
changeset
|
10 * Copyright 1997 - July 2008 CWI. |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
11 */ |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
12 |
391
f523727db392
Moved Java classes from packages starting with nl.cwi.monetdb.* to package org.monetdb.*
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
350
diff
changeset
|
13 package org.monetdb.mcl.parser; |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
14 |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
15 /** |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
16 * The TupleLineParser extracts the values from a given tuple. |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
17 * The number of values that are expected are known upfront to speed up |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
18 * allocation and validation. |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
19 * |
194
1296dbcc4958
Resolved javadoc many errors and warnings, such as:
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
180
diff
changeset
|
20 * @author Fabian Groffen |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
21 * @author Martin van Dinther |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
22 */ |
297
bb273e9c7e09
Add "final" keyword to classes, method arguments and local variables where possible.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
261
diff
changeset
|
23 public final class TupleLineParser extends MCLParser { |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
24 private StringBuilder uesc = null; // used for building field string value when an escape is present in the field value |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
25 |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
26 /** |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
27 * Constructs a TupleLineParser which expects columncount columns. |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
28 * The columncount argument is used for allocation of the public values array. |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
29 * While this seems illogical, the caller should know this size, since the |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
30 * StartOfHeader contains this information. |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
31 * |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
32 * @param columncount the number of columns in the to be parsed string |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
33 */ |
297
bb273e9c7e09
Add "final" keyword to classes, method arguments and local variables where possible.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
261
diff
changeset
|
34 public TupleLineParser(final int columncount) { |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
35 super(columncount); |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
36 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
37 |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
38 /** |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
39 * Parses the given String source as tuple line. |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
40 * If source cannot be parsed, a MCLParseException is thrown. |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
41 * |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
42 * @param source a String representing a tuple line which should be parsed |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
43 * @return 0, as there is no 'type' of TupleLine |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
44 * @throws MCLParseException if source is not compliant to expected tuple/single value format |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
45 */ |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
46 @Override |
297
bb273e9c7e09
Add "final" keyword to classes, method arguments and local variables where possible.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
261
diff
changeset
|
47 public int parse(final String source) throws MCLParseException { |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
48 final int len = source.length(); |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
49 if (len <= 0) |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
50 throw new MCLParseException("Missing tuple data"); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
51 |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
52 // first detect whether this is a single value line (=) or a real tuple ([) |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
53 char chr = source.charAt(0); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
54 if (chr == '=') { |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
55 if (values.length != 1) |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
56 throw new MCLParseException(values.length + |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
57 " columns expected, but only single value found"); |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
58 |
173
e5c67a23d7d6
Fix for bug 6350
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
90
diff
changeset
|
59 // return the whole string but without the leading = |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
60 values[0] = source.substring(1); |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
61 |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
62 // reset colnr |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
63 colnr = 0; |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
64 return 0; |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
65 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
66 |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
67 if (chr != '[') |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
68 throw new MCLParseException("Expected a data row starting with ["); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
69 |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
70 // It is a tuple. Extract separate fields by examining the string data char for char |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
71 // For parsing it is faster to use an char[] to avoid overhead of source.charAt(i) method calls |
316
d479475888e3
Replace StringBuilder methods sb.delete(0, sb.length()) with faster sb.setLength(0).
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
297
diff
changeset
|
72 final char[] chrLine = source.toCharArray(); |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
73 boolean inString = false, escaped = false, fieldHasEscape = false; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
74 int column = 0, cursor = 2; |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
75 // scan the characters, when a field separator is found extract the field value as String dealing with possible escape characters |
173
e5c67a23d7d6
Fix for bug 6350
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
90
diff
changeset
|
76 for (int i = 2; i < len; i++) { |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
77 switch(chrLine[i]) { |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
78 case '\\': |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
79 escaped = !escaped; |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
80 fieldHasEscape = true; |
180
fdf4c888d5b7
Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
173
diff
changeset
|
81 break; |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
82 case '"': |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
83 /** |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
84 * If all strings are wrapped between two quotes, a \" can |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
85 * never exist outside a string. Thus if we believe that we |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
86 * are not within a string, we can safely assume we're about |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
87 * to enter a string if we find a quote. |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
88 * If we are in a string we should stop being in a string if |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
89 * we find a quote which is not prefixed by a \, for that |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
90 * would be an escaped quote. However, a nasty situation can |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
91 * occur where the string is like "test \\" as obvious, a |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
92 * test for a \ in front of a " doesn't hold here for all |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
93 * cases. Because "test \\\"" can exist as well, we need to |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
94 * know if a quote is prefixed by an escaping slash or not. |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
95 */ |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
96 if (!inString) { |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
97 inString = true; |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
98 } else if (!escaped) { |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
99 inString = false; |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
100 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
101 // reset escaped flag |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
102 escaped = false; |
180
fdf4c888d5b7
Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
173
diff
changeset
|
103 break; |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
104 case '\t': // potential field separator found |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
105 if (!inString && |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
106 ((chrLine[i - 1] == ',') || // found field separator: ,\t |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
107 ((i + 1 == len - 1) && chrLine[++i] == ']'))) // found last field: \t] |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
108 { |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
109 // extract the field value as a string, without the potential escape codes |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
110 final int endpos = i - 2; // minus the tab and the comma or ] |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
111 if (chrLine[cursor] == '"' && |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
112 chrLine[endpos] == '"') // field is surrounded by double quotes, so a string with possible escape codes |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
113 { |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
114 cursor++; |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
115 final int fieldlen = endpos - cursor; |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
116 if (fieldHasEscape) { |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
117 if (uesc == null) { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
118 // first time use, create it with enough capacity, minimum 1024 |
814
1344603ee8af
Use intrinsics rather than manual flow control
Joeri van Ruth <joeri.van.ruth@monetdbsolutions.com>
parents:
716
diff
changeset
|
119 uesc = new StringBuilder(Math.max(fieldlen, 1024)); |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
120 } else { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
121 // reuse the StringBuilder by cleaning it |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
122 uesc.setLength(0); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
123 if (fieldlen > 1024) { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
124 // prevent multiple capacity increments during the append()'s in the inner loop |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
125 uesc.ensureCapacity(fieldlen); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
126 } |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
127 } |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
128 // parse the field value (excluding the double quotes) and convert it to a string without any escape characters |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
129 for (int pos = cursor; pos < endpos; pos++) { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
130 chr = chrLine[pos]; |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
131 if (chr == '\\' && pos + 1 < endpos) { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
132 // we detected an escape |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
133 // escapedStr and GDKstrFromStr in gdk_atoms.c only |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
134 // support \\ \f \n \r \t \" and \377 |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
135 pos++; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
136 chr = chrLine[pos]; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
137 switch (chr) { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
138 case 'f': |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
139 uesc.append('\f'); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
140 break; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
141 case 'n': |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
142 uesc.append('\n'); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
143 break; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
144 case 'r': |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
145 uesc.append('\r'); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
146 break; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
147 case 't': |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
148 uesc.append('\t'); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
149 break; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
150 case '0': case '1': case '2': case '3': |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
151 // this could be an octal number, let's check it out |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
152 if (pos + 2 < endpos) { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
153 char chr2 = chrLine[pos + 1]; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
154 char chr3 = chrLine[pos + 2]; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
155 if (chr2 >= '0' && chr2 <= '7' && chr3 >= '0' && chr3 <= '7') { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
156 // we got an octal number between \000 and \377 |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
157 try { |
316
d479475888e3
Replace StringBuilder methods sb.delete(0, sb.length()) with faster sb.setLength(0).
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
297
diff
changeset
|
158 uesc.append((char)(Integer.parseInt(new String(chrLine, pos, 3), 8))); |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
159 pos += 2; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
160 } catch (NumberFormatException e) { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
161 // hmmm, this point should never be reached actually... |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
162 throw new AssertionError("Flow error, should never try to parse non-number"); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
163 } |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
164 } else { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
165 // do default action if number seems not to be an octal number |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
166 uesc.append(chr); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
167 } |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
168 } else { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
169 // do default action if number seems not to be an octal number |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
170 uesc.append(chr); |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
171 } |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
172 break; |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
173 /* case '\\': optimisation: this code does the same as the default case, so not needed |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
174 uesc.append('\\'); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
175 break; |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
176 */ |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
177 /* case '"': optimisation: this code does the same as the default case, so not needed |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
178 uesc.append('"'); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
179 break; |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
180 */ |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
181 default: |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
182 // this is wrong usage of escape (except for '\\' and '"'), just ignore the \-escape and print the char |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
183 uesc.append(chr); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
184 break; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
185 } |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
186 } else { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
187 uesc.append(chr); |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
188 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
189 } |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
190 // put the unescaped string in the right place |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
191 values[column] = uesc.toString(); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
192 } else { |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
193 // the field is a string surrounded by double quotes and without escape chars |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
194 values[column] = new String(chrLine, cursor, fieldlen); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
195 // if (values[column].contains("\\")) { |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
196 // throw new MCLParseException("Invalid parsing: detected a \\ in double quoted string: " + fieldVal); |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
197 // } |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
198 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
199 } else { |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
200 final int vlen = i - 1 - cursor; |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
201 if (vlen == 4 && |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
202 chrLine[cursor] == 'N' && chrLine[cursor+1] == 'U' && chrLine[cursor+2] == 'L' && chrLine[cursor+3] == 'L') { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
203 // the field contains NULL, so no value |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
204 values[column] = null; |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
205 } else { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
206 // the field is a string NOT surrounded by double quotes and thus without escape chars |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
207 values[column] = new String(chrLine, cursor, vlen); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
208 // if (values[column].contains("\\")) { |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
209 // throw new MCLParseException("Invalid parsing: detected a \\ in unquoted string: " + fieldVal); |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
210 // } |
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
211 } |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
212 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
213 cursor = i + 1; |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
214 fieldHasEscape = false; // reset for next field scan |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
215 column++; |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
216 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
217 // reset escaped flag |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
218 escaped = false; |
180
fdf4c888d5b7
Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
173
diff
changeset
|
219 break; |
204
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
220 default: |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
221 escaped = false; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
222 break; |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
223 } // end of switch() |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
224 } // end of for() |
ae1d0d1c2f0f
Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
200
diff
changeset
|
225 |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
226 // check if this result is of the size we expected it to be |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
227 if (column != values.length) |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
228 throw new MCLParseException("illegal result length: " + column + "\nlast read: " + (column > 0 ? values[column - 1] : "<none>")); |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
229 |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
230 // reset colnr |
322
0fcf338ce0b4
Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents:
316
diff
changeset
|
231 colnr = 0; |
0
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
232 return 0; |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
233 } |
a5a898f6886c
Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff
changeset
|
234 } |