annotate src/main/java/org/monetdb/mcl/parser/TupleLineParser.java @ 937:d416e9b6b3d0

Update Copyright year.
author Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
date Thu, 02 Jan 2025 13:27:58 +0100 (4 months ago)
parents e890195256ac
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
1 /*
833
e890195256ac Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents: 814
diff changeset
2 * SPDX-License-Identifier: MPL-2.0
e890195256ac Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents: 814
diff changeset
3 *
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
4 * This Source Code Form is subject to the terms of the Mozilla Public
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
5 * License, v. 2.0. If a copy of the MPL was not distributed with this
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
6 * file, You can obtain one at http://mozilla.org/MPL/2.0/.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
7 *
937
d416e9b6b3d0 Update Copyright year.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 833
diff changeset
8 * Copyright 2024, 2025 MonetDB Foundation;
833
e890195256ac Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents: 814
diff changeset
9 * Copyright August 2008 - 2023 MonetDB B.V.;
e890195256ac Update copyright for the new year, move to MonetDB Foundation, add SPDX.
Sjoerd Mullender <sjoerd@acm.org>
parents: 814
diff changeset
10 * Copyright 1997 - July 2008 CWI.
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
11 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
12
391
f523727db392 Moved Java classes from packages starting with nl.cwi.monetdb.* to package org.monetdb.*
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 350
diff changeset
13 package org.monetdb.mcl.parser;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
14
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
15 /**
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
16 * The TupleLineParser extracts the values from a given tuple.
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
17 * The number of values that are expected are known upfront to speed up
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
18 * allocation and validation.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
19 *
194
1296dbcc4958 Resolved javadoc many errors and warnings, such as:
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 180
diff changeset
20 * @author Fabian Groffen
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
21 * @author Martin van Dinther
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
22 */
297
bb273e9c7e09 Add "final" keyword to classes, method arguments and local variables where possible.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 261
diff changeset
23 public final class TupleLineParser extends MCLParser {
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
24 private StringBuilder uesc = null; // used for building field string value when an escape is present in the field value
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
25
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
26 /**
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
27 * Constructs a TupleLineParser which expects columncount columns.
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
28 * The columncount argument is used for allocation of the public values array.
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
29 * While this seems illogical, the caller should know this size, since the
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
30 * StartOfHeader contains this information.
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
31 *
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
32 * @param columncount the number of columns in the to be parsed string
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
33 */
297
bb273e9c7e09 Add "final" keyword to classes, method arguments and local variables where possible.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 261
diff changeset
34 public TupleLineParser(final int columncount) {
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
35 super(columncount);
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
36 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
37
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
38 /**
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
39 * Parses the given String source as tuple line.
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
40 * If source cannot be parsed, a MCLParseException is thrown.
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
41 *
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
42 * @param source a String representing a tuple line which should be parsed
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
43 * @return 0, as there is no 'type' of TupleLine
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
44 * @throws MCLParseException if source is not compliant to expected tuple/single value format
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
45 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
46 @Override
297
bb273e9c7e09 Add "final" keyword to classes, method arguments and local variables where possible.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 261
diff changeset
47 public int parse(final String source) throws MCLParseException {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
48 final int len = source.length();
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
49 if (len <= 0)
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
50 throw new MCLParseException("Missing tuple data");
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
51
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
52 // first detect whether this is a single value line (=) or a real tuple ([)
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
53 char chr = source.charAt(0);
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
54 if (chr == '=') {
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
55 if (values.length != 1)
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
56 throw new MCLParseException(values.length +
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
57 " columns expected, but only single value found");
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
58
173
e5c67a23d7d6 Fix for bug 6350
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 90
diff changeset
59 // return the whole string but without the leading =
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
60 values[0] = source.substring(1);
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
61
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
62 // reset colnr
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
63 colnr = 0;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
64 return 0;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
65 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
66
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
67 if (chr != '[')
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
68 throw new MCLParseException("Expected a data row starting with [");
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
69
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
70 // It is a tuple. Extract separate fields by examining the string data char for char
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
71 // For parsing it is faster to use an char[] to avoid overhead of source.charAt(i) method calls
316
d479475888e3 Replace StringBuilder methods sb.delete(0, sb.length()) with faster sb.setLength(0).
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 297
diff changeset
72 final char[] chrLine = source.toCharArray();
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
73 boolean inString = false, escaped = false, fieldHasEscape = false;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
74 int column = 0, cursor = 2;
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
75 // scan the characters, when a field separator is found extract the field value as String dealing with possible escape characters
173
e5c67a23d7d6 Fix for bug 6350
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 90
diff changeset
76 for (int i = 2; i < len; i++) {
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
77 switch(chrLine[i]) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
78 case '\\':
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
79 escaped = !escaped;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
80 fieldHasEscape = true;
180
fdf4c888d5b7 Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 173
diff changeset
81 break;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
82 case '"':
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
83 /**
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
84 * If all strings are wrapped between two quotes, a \" can
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
85 * never exist outside a string. Thus if we believe that we
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
86 * are not within a string, we can safely assume we're about
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
87 * to enter a string if we find a quote.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
88 * If we are in a string we should stop being in a string if
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
89 * we find a quote which is not prefixed by a \, for that
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
90 * would be an escaped quote. However, a nasty situation can
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
91 * occur where the string is like "test \\" as obvious, a
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
92 * test for a \ in front of a " doesn't hold here for all
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
93 * cases. Because "test \\\"" can exist as well, we need to
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
94 * know if a quote is prefixed by an escaping slash or not.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
95 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
96 if (!inString) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
97 inString = true;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
98 } else if (!escaped) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
99 inString = false;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
100 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
101 // reset escaped flag
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
102 escaped = false;
180
fdf4c888d5b7 Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 173
diff changeset
103 break;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
104 case '\t': // potential field separator found
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
105 if (!inString &&
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
106 ((chrLine[i - 1] == ',') || // found field separator: ,\t
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
107 ((i + 1 == len - 1) && chrLine[++i] == ']'))) // found last field: \t]
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
108 {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
109 // extract the field value as a string, without the potential escape codes
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
110 final int endpos = i - 2; // minus the tab and the comma or ]
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
111 if (chrLine[cursor] == '"' &&
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
112 chrLine[endpos] == '"') // field is surrounded by double quotes, so a string with possible escape codes
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
113 {
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
114 cursor++;
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
115 final int fieldlen = endpos - cursor;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
116 if (fieldHasEscape) {
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
117 if (uesc == null) {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
118 // first time use, create it with enough capacity, minimum 1024
814
1344603ee8af Use intrinsics rather than manual flow control
Joeri van Ruth <joeri.van.ruth@monetdbsolutions.com>
parents: 716
diff changeset
119 uesc = new StringBuilder(Math.max(fieldlen, 1024));
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
120 } else {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
121 // reuse the StringBuilder by cleaning it
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
122 uesc.setLength(0);
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
123 if (fieldlen > 1024) {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
124 // prevent multiple capacity increments during the append()'s in the inner loop
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
125 uesc.ensureCapacity(fieldlen);
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
126 }
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
127 }
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
128 // parse the field value (excluding the double quotes) and convert it to a string without any escape characters
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
129 for (int pos = cursor; pos < endpos; pos++) {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
130 chr = chrLine[pos];
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
131 if (chr == '\\' && pos + 1 < endpos) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
132 // we detected an escape
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
133 // escapedStr and GDKstrFromStr in gdk_atoms.c only
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
134 // support \\ \f \n \r \t \" and \377
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
135 pos++;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
136 chr = chrLine[pos];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
137 switch (chr) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
138 case 'f':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
139 uesc.append('\f');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
140 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
141 case 'n':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
142 uesc.append('\n');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
143 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
144 case 'r':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
145 uesc.append('\r');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
146 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
147 case 't':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
148 uesc.append('\t');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
149 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
150 case '0': case '1': case '2': case '3':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
151 // this could be an octal number, let's check it out
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
152 if (pos + 2 < endpos) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
153 char chr2 = chrLine[pos + 1];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
154 char chr3 = chrLine[pos + 2];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
155 if (chr2 >= '0' && chr2 <= '7' && chr3 >= '0' && chr3 <= '7') {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
156 // we got an octal number between \000 and \377
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
157 try {
316
d479475888e3 Replace StringBuilder methods sb.delete(0, sb.length()) with faster sb.setLength(0).
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 297
diff changeset
158 uesc.append((char)(Integer.parseInt(new String(chrLine, pos, 3), 8)));
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
159 pos += 2;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
160 } catch (NumberFormatException e) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
161 // hmmm, this point should never be reached actually...
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
162 throw new AssertionError("Flow error, should never try to parse non-number");
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
163 }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
164 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
165 // do default action if number seems not to be an octal number
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
166 uesc.append(chr);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
167 }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
168 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
169 // do default action if number seems not to be an octal number
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
170 uesc.append(chr);
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
171 }
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
172 break;
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
173 /* case '\\': optimisation: this code does the same as the default case, so not needed
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
174 uesc.append('\\');
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
175 break;
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
176 */
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
177 /* case '"': optimisation: this code does the same as the default case, so not needed
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
178 uesc.append('"');
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
179 break;
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
180 */
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
181 default:
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
182 // this is wrong usage of escape (except for '\\' and '"'), just ignore the \-escape and print the char
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
183 uesc.append(chr);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
184 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
185 }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
186 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
187 uesc.append(chr);
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
188 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
189 }
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
190 // put the unescaped string in the right place
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
191 values[column] = uesc.toString();
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
192 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
193 // the field is a string surrounded by double quotes and without escape chars
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
194 values[column] = new String(chrLine, cursor, fieldlen);
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
195 // if (values[column].contains("\\")) {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
196 // throw new MCLParseException("Invalid parsing: detected a \\ in double quoted string: " + fieldVal);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
197 // }
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
198 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
199 } else {
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
200 final int vlen = i - 1 - cursor;
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
201 if (vlen == 4 &&
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
202 chrLine[cursor] == 'N' && chrLine[cursor+1] == 'U' && chrLine[cursor+2] == 'L' && chrLine[cursor+3] == 'L') {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
203 // the field contains NULL, so no value
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
204 values[column] = null;
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
205 } else {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
206 // the field is a string NOT surrounded by double quotes and thus without escape chars
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
207 values[column] = new String(chrLine, cursor, vlen);
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
208 // if (values[column].contains("\\")) {
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
209 // throw new MCLParseException("Invalid parsing: detected a \\ in unquoted string: " + fieldVal);
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
210 // }
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
211 }
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
212 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
213 cursor = i + 1;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
214 fieldHasEscape = false; // reset for next field scan
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
215 column++;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
216 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
217 // reset escaped flag
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
218 escaped = false;
180
fdf4c888d5b7 Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 173
diff changeset
219 break;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
220 default:
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
221 escaped = false;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
222 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
223 } // end of switch()
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
224 } // end of for()
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
225
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
226 // check if this result is of the size we expected it to be
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
227 if (column != values.length)
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
228 throw new MCLParseException("illegal result length: " + column + "\nlast read: " + (column > 0 ? values[column - 1] : "<none>"));
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
229
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
230 // reset colnr
322
0fcf338ce0b4 Optimized parse method of TupleLineParser by creating less helper objects and replacing method calls by direct operations on variables.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 316
diff changeset
231 colnr = 0;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
232 return 0;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
233 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
234 }