Php 使用正则表达式将字符串拆分为数组以获取键值对
我正在解析一个文本,但当缺少空格时,我无法获得一个片段(这是可以的)Php 使用正则表达式将字符串拆分为数组以获取键值对,php,regex,Php,Regex,我正在解析一个文本,但当缺少空格时,我无法获得一个片段(这是可以的) 编辑:我在自由文本中添加了冒号。 编辑:这是一种任意文本格式,可以用它来编写键值对。丢弃元素[0],数组上的其余元素将生成一个键值序列。并且它接受多行值 这是测试用例文本: :part1 only one \s removed:OK :part2 :text :with new lines on it :noSpaceAfterThis :thisShoudBeAStandAlongText but: here there
编辑:我在自由文本中添加了冒号。
编辑:这是一种任意文本格式,可以用它来编写键值对。丢弃元素[0],数组上的其余元素将生成一个键值序列。并且它接受多行值 这是测试用例文本:
:part1 only one \s removed:OK
:part2 :text :with
new lines
on it
:noSpaceAfterThis
:thisShoudBeAStandAlongText but: here there are more text
:part4 :even more text
这就是我想要的:
Array
(
[0] =>
[1] => part1
[2] => only one \s removed:OK
[3] => part2
[4] => :text :with
new lines
on it
[5] => noSpaceAfterThis
[6] =>
[7] => thisShoudBeAStandAlongText
[8] => but: here there are more text
[9] => part4
[10] => :even more text
)
这就是我得到的:
Array
(
[0] =>
[1] => part1
[2] => only one \s removed:OK
[3] => part2
[4] => :text :with
new lines
on it
[5] => noSpaceAfterThis
[6] => :thisShoudBeAStandAlongText but: here there are more text
[7] => part4
[8] => :even more text
)
这是我的测试代码:
<?php
$text = '
:part1 only one \s removed:OK
:part2 :text :with
new lines
on it
:noSpaceAfterThis
:thisShoudBeAStandAlongText but: here there are more text
:part4 :even more text';
echo '<pre>';
// my effort so far:
$ret = preg_split('|\r?\n:([\w\d]+)(?:\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($ret);
// nor this one:
$ret = preg_split('|\r?\n:([\w\d]+)\r?\s?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($ret);
// for debuging, an extra capturing group
$ret = preg_split('|\r?\n:([\w\d]+)(\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
var_dump($ret);
另一种预匹配方法:
# capture all the first word at line begining preceded by a colon #
(?<=^:|\n:) # lookbehind, preceded by the begining of the string
# and a colon or a newline and a colon
\S++ # all that is not a space
# capture all the content until the next line with : at first position #
(?<=\s) # lookbehind, preceded by a space
(?: # open a non capturing group
[^:]+? # all character that is not a colon, one or more times (lazy)
| # OR
(?<!^|\n): # negative lookbehind, a colon not preceded by a newline
# or the begining of the string
)+? # close the non capturing group,
#repeat one or more times (lazy)
(?= *+(?>\n:|$)) # lookahead, followed by spaces (zero or more) and a newline
# with colon at first position or the end of the string
说明:
目标是将文本分为两种情况:
- 在换行符上,当第一个字符为
时:
- 当行以
开头时,在行的第一个空格处:
因此,在这条:一行开头的word
周围有两个拆分点。
必须删除:
和后面的空格,但必须保留单词。这就是为什么我使用PREG_SPLIT_DELIM_CAPTURE来保存单词的原因
图案详情:
(?: # non capturing group (all inside will be removed)
\s*\n # trim the spaces of the precedent line and the newline
| # OR
^ # it is the begining of the string
) # end of the non capturing group
: # remove the first character when it is a :
(\S++) # keep the first word with DELIM_CAPTURE
(?: )? # remove the first space if present
@路易西:“啊!对不起,打错了。谢谢你。我已经编辑了我的问题,分号必须在行的开头。当然,没有人会理解问题和答案;-)@杰克:我添加了一些解释。这将有助于其他人了解规则是什么,也就是说,为什么它应该符合你想要的方式?:)@杰克,你说得对,我希望你同意我所做的编辑。
$res = preg_split('~(?:\s*\n|^):(\S++)(?: )?~', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
(?: # non capturing group (all inside will be removed)
\s*\n # trim the spaces of the precedent line and the newline
| # OR
^ # it is the begining of the string
) # end of the non capturing group
: # remove the first character when it is a :
(\S++) # keep the first word with DELIM_CAPTURE
(?: )? # remove the first space if present