org.apache.commons.csv
public class CSVParser extends java.lang.Object
CSVStrategy
.
Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':
String[][] data = (new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
Parsing of a csv-string in Excel CSV format
String[][] data = (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
Internal parser state is completely covered by the strategy and the reader-state.
see package documentation for more details
Modifier and Type | Class and Description |
---|---|
(package private) static class |
CSVParser.Token
Token is an internal token representation.
|
Modifier and Type | Field and Description |
---|---|
private CharBuffer |
code |
private static java.lang.String[] |
EMPTY_STRING_ARRAY
Immutable empty String array.
|
private ExtendedBufferedReader |
in |
private static int |
INITIAL_TOKEN_LENGTH
length of the initial token (content-)buffer
|
private java.util.ArrayList |
record
A record buffer for getLine().
|
private CSVParser.Token |
reusableToken |
private CSVStrategy |
strategy |
protected static int |
TT_EOF
Token (which can have content) when end of file is reached.
|
protected static int |
TT_EORECORD
Token with content when end of a line is reached.
|
protected static int |
TT_INVALID
Token has no valid content, i.e.
|
protected static int |
TT_TOKEN
Token with content, at beginning or in the middle of a line.
|
private CharBuffer |
wsBuf |
Constructor and Description |
---|
CSVParser(java.io.InputStream input)
Deprecated.
use
CSVParser(Reader) . |
CSVParser(java.io.Reader input)
CSV parser using the default
CSVStrategy . |
CSVParser(java.io.Reader input,
char delimiter)
Deprecated.
|
CSVParser(java.io.Reader input,
char delimiter,
char encapsulator,
char commentStart)
Deprecated.
|
CSVParser(java.io.Reader input,
CSVStrategy strategy)
Customized CSV parser using the given
CSVStrategy |
Modifier and Type | Method and Description |
---|---|
private CSVParser.Token |
encapsulatedTokenLexer(CSVParser.Token tkn,
int c)
An encapsulated token lexer
Encapsulated tokens are surrounded by the given encapsulating-string.
|
java.lang.String[][] |
getAllValues()
Parses the CSV according to the given strategy
and returns the content as an array of records
(whereas records are arrays of single values).
|
java.lang.String[] |
getLine()
Parses from the current point in the stream til
the end of the current line.
|
int |
getLineNumber()
Returns the current line number in the input stream.
|
CSVStrategy |
getStrategy()
Obtain the specified CSV Strategy
|
private boolean |
isEndOfFile(int c) |
private boolean |
isEndOfLine(int c)
Greedy - accepts \n and \r\n
This checker consumes silently the second control-character...
|
private boolean |
isWhitespace(int c) |
protected CSVParser.Token |
nextToken()
Convenience method for
nextToken(null) . |
protected CSVParser.Token |
nextToken(CSVParser.Token tkn)
Returns the next token.
|
java.lang.String |
nextValue()
Parses the CSV according to the given strategy
and returns the next csv-value as string.
|
private int |
readEscape(int c) |
CSVParser |
setStrategy(CSVStrategy strategy)
Deprecated.
the strategy should be set in the constructor
CSVParser(Reader,CSVStrategy) . |
private CSVParser.Token |
simpleTokenLexer(CSVParser.Token tkn,
int c)
A simple token lexer
Simple token are tokens which are not surrounded by encapsulators.
|
protected int |
unicodeEscapeLexer(int c)
Decodes Unicode escapes.
|
private static final int INITIAL_TOKEN_LENGTH
protected static final int TT_INVALID
protected static final int TT_TOKEN
protected static final int TT_EOF
protected static final int TT_EORECORD
private static final java.lang.String[] EMPTY_STRING_ARRAY
private final ExtendedBufferedReader in
private CSVStrategy strategy
private final java.util.ArrayList record
private final CSVParser.Token reusableToken
private final CharBuffer wsBuf
private final CharBuffer code
public CSVParser(java.io.InputStream input)
CSVParser(Reader)
.CSVStrategy
.input
- an InputStream containing "csv-formatted" streampublic CSVParser(java.io.Reader input)
CSVStrategy
.input
- a Reader containing "csv-formatted" inputpublic CSVParser(java.io.Reader input, char delimiter)
CSVParser(Reader,CSVStrategy)
.CSVStrategy
except for the delimiter setting.input
- a Reader based on "csv-formatted" inputdelimiter
- a Char used for value separationpublic CSVParser(java.io.Reader input, char delimiter, char encapsulator, char commentStart)
CSVParser(Reader,CSVStrategy)
.input
- a Reader based on "csv-formatted" inputdelimiter
- a Char used for value separationencapsulator
- a Char used as value encapsulation markercommentStart
- a Char used for comment identificationpublic CSVParser(java.io.Reader input, CSVStrategy strategy)
CSVStrategy
input
- a Reader containing "csv-formatted" inputstrategy
- the CSVStrategy used for CSV parsingpublic java.lang.String[][] getAllValues() throws java.io.IOException
The returned content starts at the current parse-position in the stream.
java.io.IOException
- on parse error or input read-failurepublic java.lang.String nextValue() throws java.io.IOException
java.io.IOException
- on parse error or input read-failurepublic java.lang.String[] getLine() throws java.io.IOException
java.io.IOException
- on parse error or input read-failurepublic int getLineNumber()
protected CSVParser.Token nextToken() throws java.io.IOException
nextToken(null)
.java.io.IOException
protected CSVParser.Token nextToken(CSVParser.Token tkn) throws java.io.IOException
tkn
- an existing Token object to reuse. The caller is responsible to initialize the
Token.java.io.IOException
- on stream access errorprivate CSVParser.Token simpleTokenLexer(CSVParser.Token tkn, int c) throws java.io.IOException
tkn
- the current tokenc
- the current characterjava.io.IOException
- on stream access errorprivate CSVParser.Token encapsulatedTokenLexer(CSVParser.Token tkn, int c) throws java.io.IOException
tkn
- the current tokenc
- the current characterjava.io.IOException
- on invalid stateprotected int unicodeEscapeLexer(int c) throws java.io.IOException
c
- current char which is discarded because it's the "\\" of "\\uXXXX"java.io.IOException
- on wrong unicode escape sequence or read errorprivate int readEscape(int c) throws java.io.IOException
java.io.IOException
public CSVParser setStrategy(CSVStrategy strategy)
CSVParser(Reader,CSVStrategy)
.public CSVStrategy getStrategy()
private boolean isWhitespace(int c)
private boolean isEndOfLine(int c) throws java.io.IOException
java.io.IOException
private boolean isEndOfFile(int c)