Java 在带引号的字段中使用带双引号的OpenCSV解析CSV
我正在尝试使用OpenCSV解析CSV文件。其中一列以YAML序列化格式存储数据,并被引用,因为其中可以包含逗号。它里面也有引号,所以它通过放两个引号来转义。我能够在Ruby中轻松解析这个文件,但使用OpenCSV我无法完全解析它。它是一个UTF-8编码的文件 下面是我的Java代码段,它正在尝试读取该文件Java 在带引号的字段中使用带双引号的OpenCSV解析CSV,java,csv,opencsv,Java,Csv,Opencsv,我正在尝试使用OpenCSV解析CSV文件。其中一列以YAML序列化格式存储数据,并被引用,因为其中可以包含逗号。它里面也有引号,所以它通过放两个引号来转义。我能够在Ruby中轻松解析这个文件,但使用OpenCSV我无法完全解析它。它是一个UTF-8编码的文件 下面是我的Java代码段,它正在尝试读取该文件 CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8
CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\');
下面是这个文件中的两行。第一行没有被正确地解析,并且在[Fair Trade Certified]被拆分,因为我猜是转义的双引号
1061658767,update,1196916,Product,28613099,Product::Source,"---
product_attributes:
-
- :name: Ornaments
:brand_id: 49120
:size: each
:alcoholic: false
:details: ""[Fair Trade Certified]""
:gluten_free: false
:kosher: false
:low_fat: false
:organic: false
:sugar_free: false
:fat_free: false
:vegan: false
:vegetarian: false
",,2015-11-01 00:06:19.796944,,,,,,
1061658768,create,,,28613100,Product::Source,"---
product_id:
retailer_id:
store_id:
source_id: 333790
locale: en_us
source_type: Product::PrehistoricProductDatum
priority: 1
is_definition:
product_attributes:
",,2015-11-01 00:06:19.927948,,,,,,
解决方案是使用RFC4180兼容的CSV解析器,正如所建议的那样。我使用了OpenCSV中的CSVReader,但它不起作用,或者我无法让它正常工作 我使用了RFC4180 CSV解析器,它可以无缝地工作
File file = new File(csvFilePath);
CsvReader csvReader = new CsvReader();
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
System.out.println(row.getFieldCount());
}
首先,我很高兴FastCSV为您工作,但我运行了可疑的子字符串,并通过3.9 openCSV运行了它,它与CsvParser和RFC4180Parser一起工作。请您详细说明它是如何不解析的,并/或使用3.9 openCSV进行尝试,看看您是否遇到同样的问题,然后使用下面的配置进行尝试 以下是我使用的测试: CSVParser:
@Test
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException {
String bigline = "28613099,Product::Source,\"---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"\"[Fair Trade Certified]\"\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" +
"\",,2015-11-01 00:06:19.796944";
String suspectString = "---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"[Fair Trade Certified]\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" ;
StringReader stringReader = new StringReader(bigline);
CSVReaderBuilder builder = new CSVReaderBuilder(stringReader);
CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();
String item[] = csvReader.readNext();
assertEquals(5, item.length);
assertEquals("28613099", item[0]);
assertEquals("Product::Source", item[1]);
assertEquals(suspectString, item[2]);
}
RFC4180语法分析器
def 'parse big line from stackoverflow with complex string'() {
given:
RFC4180ParserBuilder builder = new RFC4180ParserBuilder()
RFC4180Parser parser = builder.build()
String bigline = "28613099,Product::Source,\"---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"\"[Fair Trade Certified]\"\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" +
"\",,2015-11-01 00:06:19.796944"
String suspectString = "---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"[Fair Trade Certified]\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n"
when:
String[] values = parser.parseLine(bigline)
then:
values.length == 5
values[0] == "28613099"
values[1] == "Product::Source"
values[2] == suspectString
}
CSV文件的标准是RFC4180,但并不总是遵循。它包括用逗号引用字段,以及将内部引号转换为两个引号。谷歌搜索RFC4180 java解析器发现了一些可能性。使用OpenCSV您无法解析它。信用到期。@EJP不知道这意味着什么:但无论如何,使用与RFC4180兼容的解析器修复了它。谢谢@Paul。使用兼容的RFC4180解析器对我来说很有用!!至少从版本4.1开始,OpenCSV有一个RFC4180解析器。以下是javadoc: