Jackson data format module for reading and writing CSV encoded data, either as "raw" data (sequence of String arrays), or via data binding to/from Java Objects (POJOs).
Project is licensed under Apache License 2.0.
As of version 2.3, this module is considered complete and production ready. All Jackson layers (streaming, databind, tree model) are supported.
To use this extension on Maven-based projects, use following dependency:
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>2.4.0</version>
</dependency>
CSV documents are essentially rows of data, instead of JSON Objects (sequences of key/value pairs).
So one potential way to expose this data is to expose a sequence of JSON arrays; and similarly allow writing of arrays. Jackson supports this use-case (which works if you do not pass "CSV schema"), but it is not a very convenient way.
The alternative (and most commonly used) approach is to use a "CSV schema", object that defines set of names (and optionally types) for columns. This allows CsvParser
to expose CSV data as if it was a sequence of JSON objects, name/value pairs.
So how do you get a CSV Schema instance to use? There are 3 ways:
- Create schema based on a Java class
- Build schema manually
- Use the first line of CSV document to get the names (no types) for Schema
Here is code for above cases:
// Schema from POJO (usually has @JsonPropertyOrder annotation)
CsvSchema schema = mapper.schemaFor(Pojo.class);
// Manually-built schema: one with type, others default to "STRING"
CsvSchema schema = CsvSchema.builder()
.addColumn("firstName")
.addColumn("lastName")
.addColumn("age", CsvSchema.ColumnType.NUMBER)
.build();
// Read schema from the first line; start with bootstrap instance
// to enable reading of schema from the first line
// NOTE: reads schema and uses it for binding
CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
ObjectMapper mapper = new CsvMapper();
mapper.reader(Pojo.class).with(bootstrapSchema).readValue(json);
It is important to note that the schema object is needed to ensure correct ordering of columns; schema instances are immutable and fully reusable (as are ObjectWriter
instances).
Note also that while explicit type can help efficiency it is usually not required, as Jackson data binding can do common conversions/coercions such as parsing numbers from Strings.
CSV content can be read either using CsvFactory
(and parser, generators it creates) directly, or through CsvMapper
(extension of standard ObjectMapper
).
When using CsvMapper
, you will be creating ObjectReader
or ObjectWriter
instances that pass CsvSchema
along to CsvParser
/ CsvGenerator
.
When creating parser/generator directly, you will need to explicitly call setSchema(schema)
before starting to read/write content.
The most common method for reading CSV data, then, is:
CsvMapper mapper = new CsvMapper();
Pojo value = ...;
CsvSchema schema = mapper.schemaFor(Pojo.class); // schema from 'Pojo' definition
String csv = mapper.writer(schema).writeValueAsString(value);
Pojo result = mapper.reader(Pojo.class).with(schema).read(csv);
But even if you do not know (or care) about column names you can read/write CSV documents. The main difference is that in this case data is exposed as a sequence of ("JSON") Arrays, not Objects, as "raw" tabular data.
So let's consider following CSV input:
a,b
c,d
e,f
By default, Jackson CsvParser
would see it as equivalent to following JSON:
["a","b"]
["c","d"]
["e","f"]
This is easy to use; in fact, if you ignore everything to do with Schema from above examples, you get working code. For example:
CsvMapper mapper = new CsvMapper();
// important: we need "array wrapping" (see next section) here:
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
File csvFile = new File("input.csv"); // or from String, URL etc
MappingIterator<String[]> it = mapper.reader(String[].class).readValues(csvFile);
while (it.hasNext()) {
String[] row = it.next();
// and voila, column values in an array. Works with Lists as well
}
But if you want a "data as Map" approach, with data that has expected column names as the first row, followed by data rows, you can iterate over entries quite conveniently as well. Assuming we had CSV content like:
name,age
Billy,28
Barbara,36
we could use following code:
CsvSchema schema = CsvSchema.emptySchema().withHeader(); // use first row as header; otherwise defaults are fine
MappingIterator<Map<String,String>> it = mapper.reader(Map.class)
.with(schema)
.readValues(csvFile);
while (it.hasNext()) {
Map<String,String> rowAsMap = it.next();
// access by column name, as defined in the header row...
}
and get two rows as java.util.Map
s, similar to what JSON like this
{"name":"Billy","age":"28"}
{"name":"Barbara","age":"36"}
would produce.
In addition to reading things as root-level Objects or arrays, you can also force use of virtual "array wrapping".
This means that using earlier CSV data example, parser would instead expose it similar to following JSON:
[
["a","b"]
["c","d"]
["e","f"]
]
This is useful if functionality expects a single ("JSON") Array; this was the case for example when using ObjectReader.readValues()
functionality.
- Wiki (includes javadocs)
- How-to
- Performance
Since CSV is a very loose "standard", there are many extensions to basic functionality. Jackson supports following extension or variations:
- Customizable delimiters (through
CsvSchema
)- Default separator is comma (
,
), but any other character can be specified as well - Default text quoting is done using double-quote (
"
), may be changed - It is possible to enable use of an "escape character" (by default, not enabled): some variations use
\
for escaping. If enabled, character immediately followed will be used as-is, except for a small set of "well-known" escapes (\n
,\r
,\t
,\0
) - Linefeed character: when generating content, the default linefeed String used is "
\n
" but this may be changed
- Default separator is comma (
- Null value: by default, null values are serialized as empty Strings (""), but any other String value be configured to be used instead (like, say, "null", "N/A" etc)
- Use of first row as set of column names: as explained earlier, it is possible to configure
CsvSchema
to indicate that the contents of the first (non-comment) document row is taken to mean set of column names to use - Comments
- When enabled (via
CsvSchema
, or enablingJsonParser.Feature.ALLOW_YAML_COMMENTS
), if a row starts with a#
character, it will be considered a comment and skipped
- When enabled (via
- Due to tabular nature of
CSV
format, deeply nested data structures are not well supported. - Use of Tree Model (
JsonNode
) is supported, but only within limitations ofCSv
format.
Areas that are planned to be improved include things like:
- Optimizations to make number handling as efficient as from JSON (but note: even with existing code, performance is typically limited by I/O and NOT parsing or generation)
- Although, as per Java CSV parser comparison, this module is actually performing quite well already (at 2.4)
- Mapping of nested POJOs using dotted notation (similar to
@JsonUnwrapped
, but without requiring its use -- note that@JsonUnwrapped
is already supported)