Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Java中解析CSV,仅当内容包含逗号时才应用文本限定符_Java_Regex_Excel_Csv - Fatal编程技术网

在Java中解析CSV,仅当内容包含逗号时才应用文本限定符

在Java中解析CSV,仅当内容包含逗号时才应用文本限定符,java,regex,excel,csv,Java,Regex,Excel,Csv,我有一个CSV文件,其内容如下: 1,"hello, there",I have a csv in which,"only when ""double quote"" or comma are there in the content",it will be wrapped in the double quotes,otherwise not,something like 1/2" will not be wrapped up in double quotes. 1 hello, there

我有一个CSV文件,其内容如下:

1,"hello, there",I have a csv in which,"only when ""double quote"" or comma are there in the content",it will be wrapped in the double quotes,otherwise not,something like 1/2" will not be wrapped up in double quotes.
1
hello, there
I have a csv in which
only whn "double quote" or comma are there in the content
it will be wrapped in the double quotes
otherwise not
something like 1/2" will not be wrapped up in double quotes.
我使用OpenCSV和其他CSV库进行解析,但不起作用

我使用了中引用的正则表达式,但它也不起作用

然而,当我在Excel中打开它时,它工作正常。有人能给我一个关于如何解析这个CSV文件的提示吗

请注意,当内容包含逗号时,文本限定符中只包含逗号。当这些内容用双引号括起来,并且双引号是内容的一部分时,它将用双引号转义。换句话说,它将更改为双引号。但是如果内容有双引号,那么它就不会包含在文本限定符中

请对此提出建议

解析上述内容时的输出应如下所示:

输出应如下所示:

1,"hello, there",I have a csv in which,"only when ""double quote"" or comma are there in the content",it will be wrapped in the double quotes,otherwise not,something like 1/2" will not be wrapped up in double quotes.
1
hello, there
I have a csv in which
only whn "double quote" or comma are there in the content
it will be wrapped in the double quotes
otherwise not
something like 1/2" will not be wrapped up in double quotes.
我尝试使用open csv,也尝试使用正则表达式拆分:

",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
但是没有用

我的数据如下:

PRODUCT,,1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED  - EQUAL TO 115-DCD960KL,
产品,1/2“18V无绳XRP LI-LON钻头/驱动器,P,2510906459,脱蜡工具,,

计量单位:EA

每计量单位数量:1

最低订购数量:1

脱蜡工具DCD960KL-1/2”18V无绳XRP LI-LON钻头/驱动器套件-XRP™ 无绳电钻-同类最佳长度,可提高平衡性和更好的控制能力| LED工作灯可提高密闭空间的可见度|专利3速全金属变速器可与工具匹配,以实现最快的应用速度和改进-相当于115-DCD960KL,

希望将其解析为如下所示(当我们在excel中看到它时,我通常表示一个空单元格)

产品
1/2“18V无绳XRP LI-LON钻机/驱动器
P
2510906459
脱蜡工具



计量单位:EA

每个计量单位的数量:1

最小订购数量:1

脱蜡工具DCD960KL-1/2“18V无绳XRP LI-LON钻孔机/驱动器套件-XRP™ 无绳电钻-同类最佳长度,可提高平衡性和更好的控制能力| LED工作灯可提高密闭空间的可见度|获得专利的3速全金属传输与工具匹配,以实现最快的应用速度和改进-等于115-DCD960KL
尝试以下正则表达式:

Stream<String> lines = Files.lines(Paths.get("path to csv file"));

Pattern regex = Pattern.compile("\"(.*?)\"(?=,|$)|(?<=(?:,|^))(.*?)(?=,|$)",
        Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

lines.forEach( line -> {
    Matcher matcher = regex.matcher(line);
    while (matcher.find()) {
        String content = matcher.group(1) == null ? matcher.group() : matcher.group(1);
        System.out.println(content);
    }
});
它会发出

1
hello, there
I have a csv in which
only when ""double quote"" or comma are there in the content
it will be wrapped in the double quotes
otherwise not
something like 1/2" will not be wrapped up in double quotes.

我在解析您的输入时没有遇到任何问题:

String input=“产品,1/2\”18V无绳XRP LI-LON钻机/驱动器,P,2510906459,脱蜡工具,,

计量单位:EA

每计量单位数量:1

最低订购数量:1

脱蜡工具DCD960KL-1/2\<18V无绳XRP LI-LON钻机/驱动器套件-XRP™ 无绳电钻-同类最佳长度,可提高平衡性和更好的控制能力| LED工作灯可提高密闭空间的可见度|专利3速全金属传输与工具匹配,以实现最快的应用速度和改进-相当于115-DCD960KL“; 读卡器=新的StringReader(输入); CsvParserSettings=新CsvParserSettings()//这里有很多选项,请查看教程。 settings.setNullValue(“”)//使用它来获取表示空值的值 String[]row=新CsvParser(设置)。parseAll(读卡器)。get(0); for(字符串元素:行){ 系统输出打印项次(元素); }

输出:

PRODUCT
<BLANK>
1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE
P
2510906459
<BLANK>
DEWALT TOOLS
<BLANK>
<BLANK>
<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED  - EQUAL TO 115-DCD960KL
<BLANK>
产品
1/2“18V无绳XRP LI-LON钻机/驱动器
P
2510906459
脱蜡工具



计量单位:EA

每个计量单位的数量:1

最小订购数量:1

脱蜡工具DCD960KL-1/2“18V无绳XRP LI-LON钻孔机/驱动器套件-XRP™ 无绳电钻-同类最佳长度,可提高平衡性和更好的控制能力| LED工作灯可提高密闭空间的可见度|获得专利的3速全金属传输与工具匹配,以实现最快的应用速度和改进-等于115-DCD960KL

免责声明:我是这个库的作者,它是开源和免费的(Apache 2.0许可证)

您现在的问题很难阅读和理解。请您将其正确格式化,添加您的内容可能是什么、您希望如何将其拆分的示例,以及您迄今为止所做的尝试。感谢Sebastian阅读本文。编辑内容。请让我知道它现在是否更具可读性。如前所述,除非你对“1/2”做出更严格的定义,否则这个问题是不可能解决的“嗨,大流士,谢谢你的评论。更新了问题的更多细节。请参考并让我知道。@saleem mirza感谢您的回答。它很有魅力。唯一缺少的是它没有返回最后一项。你能提出一些修改建议吗?同样,双引号也在按原样进行:没有减少到非转义双引号,请查看更新的代码。希望它能为你工作。嗨@JeronimoBackes这是我解析csv的梦想API。非常感谢,谢谢。节省了大量的时间。其他解析器没有一个能够检测到多行场景,在这种场景中,这种方法非常有效
PRODUCT
<BLANK>
1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE
P
2510906459
<BLANK>
DEWALT TOOLS
<BLANK>
<BLANK>
<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED  - EQUAL TO 115-DCD960KL
<BLANK>