Java CSVReader-使用“时出现错误”;转义字符

Java CSVReader-使用“时出现错误”;转义字符,java,opencsv,Java,Opencsv,我正在使用OpenCSV 我有一个CSVReader试图解析CSV文件。 该文件有引号char“和分隔符char,和转义字符” 请注意,CSV包含如下单元格: "ballet 24"" classes" "\" 实际上代表这些值: ballet 24" classes \ 例如: "9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice le

我正在使用OpenCSV

我有一个
CSVReader
试图解析CSV文件。
该文件有引号char
和分隔符char
和转义字符

请注意,CSV包含如下单元格:

"ballet 24"" classes"
"\"  
实际上代表这些值:

ballet 24" classes
\
例如:

"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","\","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""
我的问题是我无法为
CSVReader
构造函数的转义字符指定
” (即,使其与引号字符相同)。
如果我这样做,
CSVReader
简直疯了,它将整个CSV行作为单个CSV单元格读取


是否有其他人遇到过此错误以及如何避免它?!

如果您使用CsvReader的默认设置,它将起作用

检查他们的这个开放bug::

实际上,它很好用,只是不是你想的那样。它的默认值是 逗号表示分隔符,引号表示引号字符,反斜杠表示 转义字符。但是,它理解两个连续的引号 字符作为转义引号字符。所以,如果您只使用 默认情况下,它可以正常工作

默认情况下,它可以用双引号转义双引号,但您的“true”转义字符必须仍然是其他字符

因此,以下工作:

CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
  • 逗号分隔符
  • 双引号作为引号字符
  • 破折号(任何其他字符)作为转义字符

起初我将“\”作为转义字符,但随后需要修改字段“\”以转义转义字符。

CSVReader不完全符合RFC4180。请使用其较新的CSV读取器(RFC4180Parser):

要读取格式化为CSV的字符串行,请执行以下操作:

String test = "ballet 24\"\" classes";
String[] columns = new RFC4180Parser().parseLine(test);
要使用读卡器(另一种选择是
reader.readNext()
):

有关更多详细信息,请参阅


取自

的代码不能通过CSVReader完成

from pyspark.sql.session import SparkSession

spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")

起初,我将\定义为转义字符,但您的一个字段将此字符用作普通字符。因此,我将其更改为另一个伪字符(即:破折号),并成功使用。我自己已经想到了这种方法(我使用了一些异国情调的Unicode字符,它永远不会出现在我的CSV文件中,而不是破折号),但这是胡说八道,对吗?为什么没有选择将引号字符也作为转义字符?感谢bug链接。所以他们似乎都不明白这是一个bug。正如一个人在bug中写道的“OpenCSV可以编写无法读取的CSV文件”“这是这里的大问题。CSVWriter比CSVReader更灵活,存在相当多的不对称性。
for (String[] line : reader.readAll()) {
  for (String s : line) {
    System.out.println(s);
  }
}
from pyspark.sql.session import SparkSession

spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")