annotate src/main/java/nl/cwi/monetdb/mcl/parser/TupleLineParser.java @ 261:d4baf8a4b43a

Update Copyright year to 2019
author Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
date Thu, 03 Jan 2019 14:43:44 +0100 (2019-01-03)
parents ae1d0d1c2f0f
children bb273e9c7e09
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
1 /*
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
2 * This Source Code Form is subject to the terms of the Mozilla Public
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
3 * License, v. 2.0. If a copy of the MPL was not distributed with this
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
4 * file, You can obtain one at http://mozilla.org/MPL/2.0/.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
5 *
261
d4baf8a4b43a Update Copyright year to 2019
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 204
diff changeset
6 * Copyright 1997 - July 2008 CWI, August 2008 - 2019 MonetDB B.V.
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
7 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
8
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
9 package nl.cwi.monetdb.mcl.parser;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
10
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
11 /**
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
12 * The TupleLineParser extracts the values from a given tuple. The
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
13 * number of values that are expected are known upfront to speed up
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
14 * allocation and validation.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
15 *
194
1296dbcc4958 Resolved javadoc many errors and warnings, such as:
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 180
diff changeset
16 * @author Fabian Groffen
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
17 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
18 public class TupleLineParser extends MCLParser {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
19 /**
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
20 * Constructs a TupleLineParser which expects columncount columns.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
21 *
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
22 * @param columncount the number of columns in the to be parsed string
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
23 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
24 public TupleLineParser(int columncount) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
25 super(columncount);
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
26 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
27
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
28 /**
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
29 * Parses the given String source as tuple line. If source cannot
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
30 * be parsed, a ParseException is thrown. The columncount argument
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
31 * is used for allocation of the returned array. While this seems
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
32 * illogical, the caller should know this size, since the
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
33 * StartOfHeader contains this information.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
34 *
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
35 * @param source a String which should be parsed
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
36 * @return 0, as there is no 'type' of TupleLine
194
1296dbcc4958 Resolved javadoc many errors and warnings, such as:
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 180
diff changeset
37 * @throws MCLParseException if an error occurs during parsing
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
38 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
39 @Override
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
40 public int parse(String source) throws MCLParseException {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
41 final int len = source.length();
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
42 // first detect whether this is a single value line (=) or a
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
43 // real tuple ([)
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
44 if (len >= 1 && source.charAt(0) == '=') {
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
45 if (values.length != 1)
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
46 throw new MCLParseException(values.length +
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
47 " columns expected, but only single value found");
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
48
173
e5c67a23d7d6 Fix for bug 6350
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 90
diff changeset
49 // return the whole string but without the leading =
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
50 values[0] = source.substring(1);
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
51
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
52 // reset colnr
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
53 reset();
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
54 return 0;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
55 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
56
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
57 if (!source.startsWith("["))
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
58 throw new MCLParseException("Expected a data row starting with [");
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
59
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
60 // It is a tuple. Extract separate fields by examining the string data char for char
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
61 final char[] chrLine = source.toCharArray(); // convert whole string to char[] to avoid overhead of source.charAt(i) calls TODO: measure the overhead
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
62 boolean inString = false, escaped = false, fieldHasEscape = false;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
63 final StringBuilder uesc = new StringBuilder(128); // used for building field string value when an escape is present in the field value
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
64 int column = 0, cursor = 2;
173
e5c67a23d7d6 Fix for bug 6350
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 90
diff changeset
65 for (int i = 2; i < len; i++) {
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
66 switch(chrLine[i]) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
67 case '\\':
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
68 escaped = !escaped;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
69 fieldHasEscape = true;
180
fdf4c888d5b7 Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 173
diff changeset
70 break;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
71 case '"':
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
72 /**
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
73 * If all strings are wrapped between two quotes, a \" can
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
74 * never exist outside a string. Thus if we believe that we
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
75 * are not within a string, we can safely assume we're about
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
76 * to enter a string if we find a quote.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
77 * If we are in a string we should stop being in a string if
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
78 * we find a quote which is not prefixed by a \, for that
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
79 * would be an escaped quote. However, a nasty situation can
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
80 * occur where the string is like "test \\" as obvious, a
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
81 * test for a \ in front of a " doesn't hold here for all
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
82 * cases. Because "test \\\"" can exist as well, we need to
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
83 * know if a quote is prefixed by an escaping slash or not.
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
84 */
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
85 if (!inString) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
86 inString = true;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
87 } else if (!escaped) {
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
88 inString = false;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
89 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
90 // reset escaped flag
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
91 escaped = false;
180
fdf4c888d5b7 Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 173
diff changeset
92 break;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
93 case '\t': // potential field separator found
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
94 if (!inString &&
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
95 ((chrLine[i - 1] == ',') || // found field separator: ,\t
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
96 ((i + 1 == len - 1) && chrLine[++i] == ']'))) // found last field: \t]
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
97 {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
98 // extract the field value as a string, without the potential escape codes
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
99 final int endpos = i - 2; // minus the tab and the comma or ]
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
100 if (chrLine[cursor] == '"' &&
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
101 chrLine[endpos] == '"') // field is surrounded by double quotes, so a string with possible escape codes
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
102 {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
103 if (fieldHasEscape) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
104 // reuse the StringBuilder by cleaning it
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
105 uesc.delete(0, uesc.length());
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
106 // prevent capacity increasements
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
107 uesc.ensureCapacity(endpos - (cursor + 1));
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
108 // parse the field value (excluding the double quotes) and convert it to a string without any escape characters
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
109 for (int pos = cursor + 1; pos < endpos; pos++) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
110 char chr = chrLine[pos];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
111 if (chr == '\\' && pos + 1 < endpos) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
112 // we detected an escape
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
113 // escapedStr and GDKstrFromStr in gdk_atoms.c only
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
114 // support \\ \f \n \r \t \" and \377
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
115 pos++;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
116 chr = chrLine[pos];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
117 switch (chr) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
118 case '\\':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
119 uesc.append('\\');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
120 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
121 case 'f':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
122 uesc.append('\f');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
123 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
124 case 'n':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
125 uesc.append('\n');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
126 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
127 case 'r':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
128 uesc.append('\r');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
129 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
130 case 't':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
131 uesc.append('\t');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
132 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
133 case '"':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
134 uesc.append('"');
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
135 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
136 case '0': case '1': case '2': case '3':
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
137 // this could be an octal number, let's check it out
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
138 if (pos + 2 < endpos) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
139 char chr2 = chrLine[pos + 1];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
140 char chr3 = chrLine[pos + 2];
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
141 if (chr2 >= '0' && chr2 <= '7' && chr3 >= '0' && chr3 <= '7') {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
142 // we got an octal number between \000 and \377
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
143 try {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
144 uesc.append((char)(Integer.parseInt("" + chr + chr2 + chr3, 8)));
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
145 pos += 2;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
146 } catch (NumberFormatException e) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
147 // hmmm, this point should never be reached actually...
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
148 throw new AssertionError("Flow error, should never try to parse non-number");
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
149 }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
150 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
151 // do default action if number seems not to be an octal number
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
152 uesc.append(chr);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
153 }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
154 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
155 // do default action if number seems not to be an octal number
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
156 uesc.append(chr);
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
157 }
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
158 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
159 default:
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
160 // this is wrong usage of escape, just ignore the \-escape and print the char
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
161 uesc.append(chr);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
162 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
163 }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
164 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
165 uesc.append(chr);
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
166 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
167 }
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
168 // put the unescaped string in the right place
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
169 values[column] = uesc.toString();
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
170 } else {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
171 // the field is a string surrounded by double quotes and without escape chars
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
172 cursor++;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
173 String fieldVal = new String(chrLine, cursor, endpos - cursor);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
174 // if (fieldVal.contains("\\")) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
175 // throw new MCLParseException("Invalid parsing: detected a \\ in double quoted string: " + fieldVal);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
176 // }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
177 values[column] = fieldVal;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
178 }
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
179 } else if (((i - 1 - cursor) == 4) && source.indexOf("NULL", cursor) == cursor) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
180 // the field contains NULL, so no value
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
181 values[column] = null;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
182 } else {
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
183 // the field is a string NOT surrounded by double quotes and thus without escape chars
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
184 String fieldVal = new String(chrLine, cursor, i - 1 - cursor);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
185 // if (fieldVal.contains("\\")) {
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
186 // throw new MCLParseException("Invalid parsing: detected a \\ in unquoted string: " + fieldVal);
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
187 // }
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
188 values[column] = fieldVal;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
189 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
190 cursor = i + 1;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
191 fieldHasEscape = false; // reset for next field scan
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
192 column++;
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
193 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
194 // reset escaped flag
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
195 escaped = false;
180
fdf4c888d5b7 Small code and layout improvements
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 173
diff changeset
196 break;
204
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
197 default:
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
198 escaped = false;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
199 break;
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
200 } // end of switch()
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
201 } // end of for()
ae1d0d1c2f0f Optimize TupleLineParser by doing less copying of string data when field value does not contain an escape character, which is the case for most strings.
Martin van Dinther <martin.van.dinther@monetdbsolutions.com>
parents: 200
diff changeset
202
0
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
203 // check if this result is of the size we expected it to be
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
204 if (column != values.length)
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
205 throw new MCLParseException("illegal result length: " + column + "\nlast read: " + (column > 0 ? values[column - 1] : "<none>"));
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
206
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
207 // reset colnr
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
208 reset();
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
209 return 0;
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
210 }
a5a898f6886c Copy of MonetDB java directory changeset e6e32756ad31.
Sjoerd Mullender <sjoerd@acm.org>
parents:
diff changeset
211 }