Java CSVReader-使用“时出现错误”;转义字符
我正在使用OpenCSV 我有一个Java CSVReader-使用“时出现错误”;转义字符,java,opencsv,Java,Opencsv,我正在使用OpenCSV 我有一个CSVReader试图解析CSV文件。 该文件有引号char“和分隔符char,和转义字符” 请注意,CSV包含如下单元格: "ballet 24"" classes" "\" 实际上代表这些值: ballet 24" classes \ 例如: "9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice le
CSVReader
试图解析CSV文件。该文件有引号char
“
和分隔符char,
和转义字符”
请注意,CSV包含如下单元格:
"ballet 24"" classes"
"\"
实际上代表这些值:
ballet 24" classes
\
例如:
"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","\","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""
我的问题是我无法为CSVReader
构造函数的转义字符指定“
”
(即,使其与引号字符相同)。如果我这样做,
CSVReader
简直疯了,它将整个CSV行作为单个CSV单元格读取
是否有其他人遇到过此错误以及如何避免它?!如果您使用CsvReader的默认设置,它将起作用 检查他们的这个开放bug:: 实际上,它很好用,只是不是你想的那样。它的默认值是 逗号表示分隔符,引号表示引号字符,反斜杠表示 转义字符。但是,它理解两个连续的引号 字符作为转义引号字符。所以,如果您只使用 默认情况下,它可以正常工作 默认情况下,它可以用双引号转义双引号,但您的“true”转义字符必须仍然是其他字符 因此,以下工作:
CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
- 逗号分隔符
- 双引号作为引号字符
- 破折号(任何其他字符)作为转义字符
起初我将“\”作为转义字符,但随后需要修改字段“\”以转义转义字符。CSVReader不完全符合RFC4180。请使用其较新的CSV读取器(RFC4180Parser): 要读取格式化为CSV的字符串行,请执行以下操作:
String test = "ballet 24\"\" classes";
String[] columns = new RFC4180Parser().parseLine(test);
要使用读卡器(另一种选择是reader.readNext()
):
有关更多详细信息,请参阅
取自的代码不能通过CSVReader完成
from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")
起初,我将\定义为转义字符,但您的一个字段将此字符用作普通字符。因此,我将其更改为另一个伪字符(即:破折号),并成功使用。我自己已经想到了这种方法(我使用了一些异国情调的Unicode字符,它永远不会出现在我的CSV文件中,而不是破折号),但这是胡说八道,对吗?为什么没有选择将引号字符也作为转义字符?感谢bug链接。所以他们似乎都不明白这是一个bug。正如一个人在bug中写道的“OpenCSV可以编写无法读取的CSV文件”“这是这里的大问题。CSVWriter比CSVReader更灵活,存在相当多的不对称性。
for (String[] line : reader.readAll()) {
for (String s : line) {
System.out.println(s);
}
}
from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")