Perl 如何解析包含序列化PHP的CSV文件?

Perl 如何解析包含序列化PHP的CSV文件?,perl,csv,Perl,Csv,我刚刚开始涉猎Perl,试图接触不同的编程语言——如果下面的一些代码很糟糕,请原谅我 我需要一个快速脏的CSV解析器,它可以接收CSV文件,并将其拆分为包含“X”个CSV行的文件批(考虑到条目可能包含嵌入的换行符) 我想出了一个有效的解决方案,而且进展顺利。然而,作为我试图拆分的CSV文件之一,我遇到了一个包含序列化PHP代码的文件 这似乎打破了CSV解析。一旦删除序列化,CSV文件就被正确解析 在解析CSV文件中的序列化数据时,我需要知道哪些技巧 以下是代码的简化示例: use strict;

我刚刚开始涉猎Perl,试图接触不同的编程语言——如果下面的一些代码很糟糕,请原谅我

我需要一个快速脏的CSV解析器,它可以接收CSV文件,并将其拆分为包含“X”个CSV行的文件批(考虑到条目可能包含嵌入的换行符)

我想出了一个有效的解决方案,而且进展顺利。然而,作为我试图拆分的CSV文件之一,我遇到了一个包含序列化PHP代码的文件

这似乎打破了CSV解析。一旦删除序列化,CSV文件就被正确解析

在解析CSV文件中的序列化数据时,我需要知道哪些技巧

以下是代码的简化示例:

use strict;
use warnings;

my $csv = Text::CSV_XS->new({ eol => $/, always_quote => 1, binary => 1 });
my $out;
my $in;

open $in, "<:encoding(utf8)", "infile.csv" or die("cannot open input file $inputfile");
open $out, ">outfile.000";
binmode($out, ":utf8");
while (my $line = $csv->getline($in)) {
    $lines++;
    $csv->print($out, $line);
}

您试图读取的CSV使用反斜杠转义嵌入的引号,但默认情况下是通过将引号加倍来转义。尝试将
escape\u char=>'\\'
添加到
Text::CSV\u XS
构造函数中

如果它使用反斜杠引用其他严格不需要的内容,如换行符,则可能还需要
allow\u loose\u escapes=>1


另一个选项是将writer更改为使用双引号而不是反斜杠进行转义。可能是也可能不是。加倍引号是CSV更常见的风格,虽然编程解析器通常可以同时读取这两种引号(如果被告知),但您将无法使用反斜杠读取变量,例如在Excel中。

如果您的CSV解析器符合标准,并且您的CSV文件也符合标准,则它将“正常工作”。否则,您将不得不为bug兼容解析器编写一个bug。您能提供一个有问题行的示例吗?@JanHudec,添加了示例行数据。现在我们可以看看它是什么风格的CSV。因为没有@tylerl建议的单一CSV标准。有不同的口味,与默认情况下
Text::csvxs
所期望的略有不同。但它可以使用构造函数参数进行配置。@JanHudec,因为到目前为止,他几乎总是没有迁移投票权。奇怪的是,我似乎仍然遇到了嵌入换行符和回车的问题。我将用更多的样本数据更新这个问题。我认为一行不够。@garbetjie:试着添加
允许\u loose\u转义
。然后阅读(您传递的参数似乎是从某个地方复制的,对解析没有多大意义)。我确实阅读了有关它的文档,但显然还不够彻底。您是否能够提供任何见解,说明为什么这会产生影响?CR有什么特别之处会破坏解析?@garbetjie:太普通了。解析器不希望它被转义,因为它不必在引号中转义。默认情况下,他们会对此抱怨。所以你必须告诉它忽略不必要的逃避。
"26","other","1","20,000 Subscriber Plan","Some text here.^M\
Some more text","on","","18","","0","","0","0","recurring","0","","payment","totalsend","0","tsadmin","R34bL9oq","37","0","0","","","","","","","","","","","","","","","","","","","","","","","0","0","0","a:18:{i:0;s:1:\"3\";i:1;s:1:\"2\";i:2;s:2:\"59\";i:3;s:2:\"60\";i:4;s:2:\"61\";i:5;s:2:\"62\";i:6;s:2:\"63\";i:7;s:2:\"64\";i:8;s:2:\"65\";i:9;s:2:\"66\";i:10;s:2:\"67\";i:11;s:2:\"68\";i:12;s:2:\"69\";i:13;s:2:\"70\";i:14;s:2:\"71\";i:15;s:2:\"72\";i:16;s:2:\"73\";i:17;s:2:\"74\";}","","","0","0","","0","0","0.0000","0.0000","0","","","0.00","","6","1"
"27","other","1","35,000 Subscriber Plan","Some test here.^M\
Some more text","on","","18","","0","","0","0","recurring","0","","payment","totalsend","0","tsadmin","R34bL9oq","38","0","0","","","","","","","","","","","","","","","","","","","","","","","0","0","0","a:18:{i:0;s:1:\"3\";i:1;s:1:\"2\";i:2;s:2:\"59\";i:3;s:2:\"60\";i:4;s:2:\"61\";i:5;s:2:\"62\";i:6;s:2:\"63\";i:7;s:2:\"64\";i:8;s:2:\"65\";i:9;s:2:\"66\";i:10;s:2:\"67\";i:11;s:2:\"68\";i:12;s:2:\"69\";i:13;s:2:\"70\";i:14;s:2:\"71\";i:15;s:2:\"72\";i:16;s:2:\"73\";i:17;s:2:\"74\";}","","","0","0","","0","0","0.0000","0.0000","0","","","0.00","","7","1"
"28","other","1","50,000 Subscriber Plan","Some text here.^M\
Some more text","on","","18","","0","","0","0","recurring","0","","payment","totalsend","0","tsadmin","R34bL9oq","39","0","0","","","","","","","","","","","","","","","","","","","","","","","0","0","0","a:18:{i:0;s:1:\"3\";i:1;s:1:\"2\";i:2;s:2:\"59\";i:3;s:2:\"60\";i:4;s:2:\"61\";i:5;s:2:\"62\";i:6;s:2:\"63\";i:7;s:2:\"64\";i:8;s:2:\"65\";i:9;s:2:\"66\";i:10;s:2:\"67\";i:11;s:2:\"68\";i:12;s:2:\"69\";i:13;s:2:\"70\";i:14;s:2:\"71\";i:15;s:2:\"72\";i:16;s:2:\"73\";i:17;s:2:\"74\";}","","","0","0","","0","0","0.0000","0.0000","0","","","0.00","","8","1""73","other","8","10,000,000","","","","0","","0","","0","0","recurring","0","","payment","","0","","","75","0","10000000","","","","","","","","","","","","","","","","","","","","","","","0","0","0","a:17:{i:0;s:1:\"3\";i:1;s:1:\"2\";i:2;s:2:\"59\";i:3;s:2:\"60\";i:4;s:2:\"61\";i:5;s:2:\"62\";i:6;s:2:\"63\";i:7;s:2:\"64\";i:8;s:2:\"65\";i:9;s:2:\"66\";i:10;s:2:\"67\";i:11;s:2:\"68\";i:12;s:2:\"69\";i:13;s:2:\"70\";i:14;s:2:\"71\";i:15;s:2:\"72\";i:16;s:2:\"74\";}","","","0","0","","0","0","0.0000","0.0000","0","","","0.00","","14","0"