Php 正则表达式:使用不在单引号中的逗号进行拆分,允许使用转义引号
我正在寻找一个在PHP5中使用preg_match_all的正则表达式,它允许我用逗号分割字符串,只要逗号不存在于单引号中,允许使用转义单引号。示例数据如下:Php 正则表达式:使用不在单引号中的逗号进行拆分,允许使用转义引号,php,regex,Php,Regex,我正在寻找一个在PHP5中使用preg_match_all的正则表达式,它允许我用逗号分割字符串,只要逗号不存在于单引号中,允许使用转义单引号。示例数据如下: (some_array, 'some, string goes here','another_string','this string may contain "double quotes" but, it can\'t split, on escaped single quotes', anonquotedstring, 8344854
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
这将生成一个如下所示的匹配:
(some_array
'some, string goes here'
'another_string'
'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
anonquotedstring
83448545
1210597346 + '000'
1241722133 + '000')
我试过很多很多正则表达式。。。我现在的一个看起来像这样,虽然它不是100%正确匹配。(它仍然在单引号内拆分一些逗号。)
“/”(.*)(您试过了吗?它不需要正则表达式就可以完全满足您的需要
$result = str_getcsv($str, ",", "'");
您甚至可以在早于5.3的PHP版本中实现此方法,使用文档中的以下代码片段映射到fgetcsv
:
if (!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = null, $eol = null) {
$temp = fopen("php://memory", "rw");
fwrite($temp, $input);
fseek($temp, 0);
$r = fgetcsv($temp, 4096, $delimiter, $enclosure);
fclose($temp);
return $r;
}
}
在PHP5.3以后的版本中,您可以使用
以你为例
$input=<<<STR
(some_array, 'some, string goes here','another_string','this string may contain "double quotes" but it can\'t split on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')
STR;
$data=str_getcsv($input, ",", "'");
print_r($data);
我支持在这里使用CSV解析器,这就是它们的用途
如果你坚持使用正则表达式,你可以使用
preg_match_all(
'/\s*" # either match " (optional preceding whitespace),
(?:\\\\. # followed either by an escaped character
| # or
[^"] # any character except "
)* # any number of times,
"\s* # followed by " (and optional whitespace).
| # Or: do the same thing for single-quoted strings.
\s*\'(?:\\\\.|[^\'])*\'\s*
| # Or:
[^,]* # match anything except commas (i.e. any remaining unquoted strings)
/x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
但是,正如你所看到的,这很难看,也很难维护。请使用合适的工具来完成这项工作。回头看看,你可以得到接近你想要的东西:
$test = "(some_array, 'some, string goes here','another_string','this string may contain \"double quotes\" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')";
preg_match_all('`
(?:[^,\']|
\'((?<=\\\\)\'|[^\'])*\')*
`x', $test, $result);
print_r($result);
这是可以做到的,但比大多数人想象的要困难得多;看起来你现在感觉到了困难。在PHP中真的没有库函数来处理这个问题吗?在Perl中有。如果到那时你还没有得到一个好的答案,我可能会试着稍后为你整理正则表达式。天哪……我觉得很笨,哈哈……一直在做PHP编写了八年的代码,但从未使用过该函数。此解决方案有效。str_getcsv不是有效的函数,因为我没有运行PHP 5.3+。不幸的是,str_getcsv在处理单引号内逗号的方式上不一致:@greggles:我不知道对CSV的任何解释允许单引号作为字符串附件。它不在但是,PHP确实允许你根据Doc来把这个包设置成单引号。Google Poice,AdYe。我的情况恰好包括“作为字段外壳,虽然我不知道为什么。我肯定会考虑在将来移动”。第二个数组是“e g s 0”?你打算把它扔掉吗?
Array
(
[0] => (some_array
[1] => some, string goes here
[2] => another_string
[3] => this string may contain "double quotes" but it can\'t split on escaped single quotes
[4] => anonquotedstring
[5] => 83448545
[6] => 1210597346 + '000'
[7] => 1241722133 + '000')
)
preg_match_all(
'/\s*" # either match " (optional preceding whitespace),
(?:\\\\. # followed either by an escaped character
| # or
[^"] # any character except "
)* # any number of times,
"\s* # followed by " (and optional whitespace).
| # Or: do the same thing for single-quoted strings.
\s*\'(?:\\\\.|[^\'])*\'\s*
| # Or:
[^,]* # match anything except commas (i.e. any remaining unquoted strings)
/x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
$test = "(some_array, 'some, string goes here','another_string','this string may contain \"double quotes\" but, it can\'t split, on escaped single quotes', anonquotedstring, 83448545, 1210597346 + '000', 1241722133 + '000')";
preg_match_all('`
(?:[^,\']|
\'((?<=\\\\)\'|[^\'])*\')*
`x', $test, $result);
print_r($result);
Array
(
[0] => Array
(
[0] => (some_array
[1] =>
[2] => 'some, string goes here'
[3] =>
[4] => 'another_string'
[5] =>
[6] => 'this string may contain "double quotes" but, it can\'t split, on escaped single quotes'
[7] =>
[8] => anonquotedstring
[9] =>
[10] => 83448545
[11] =>
[12] => 1210597346 + '000'
[13] =>
[14] => 1241722133 + '000')
[15] =>
)
[1] => Array
(
[0] =>
[1] =>
[2] => e
[3] =>
[4] => g
[5] =>
[6] => s
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] => 0
[13] =>
[14] => 0
[15] =>
)
)