Php 使用正则表达式将字符串拆分为数组以获取键值对_Php_Regex

Php 使用正则表达式将字符串拆分为数组以获取键值对

php regex

Php 使用正则表达式将字符串拆分为数组以获取键值对,php,regex,Php,Regex,我正在解析一个文本，但当缺少空格时，我无法获得一个片段（这是可以的）编辑：我在自由文本中添加了冒号。编辑：这是一种任意文本格式，可以用它来编写键值对。丢弃元素[0]，数组上的其余元素将生成一个键值序列。并且它接受多行值这是测试用例文本： :part1 only one \s removed:OK :part2 :text :with new lines on it :noSpaceAfterThis :thisShoudBeAStandAlongText but: here there

我正在解析一个文本，但当缺少空格时，我无法获得一个片段（这是可以的）
编辑：我在自由文本中添加了冒号。
编辑：这是一种任意文本格式，可以用它来编写键值对。丢弃元素[0]，数组上的其余元素将生成一个键值序列。并且它接受多行值

这是测试用例文本：

:part1  only one \s removed:OK
:part2 :text :with
new lines
on it
:noSpaceAfterThis
:thisShoudBeAStandAlongText but: here there are more text
:part4 :even more text

这就是我想要的：

Array
(
    [0] => 
    [1] => part1
    [2] =>  only one \s removed:OK
    [3] => part2
    [4] => :text :with
new lines
on it
    [5] => noSpaceAfterThis
    [6] => 
    [7] => thisShoudBeAStandAlongText
    [8] => but: here there are more text
    [9] => part4
    [10] => :even more text
)

这就是我得到的：

Array
(
    [0] => 
    [1] => part1
    [2] =>  only one \s removed:OK
    [3] => part2
    [4] => :text :with
new lines
on it
    [5] => noSpaceAfterThis
    [6] => :thisShoudBeAStandAlongText but: here there are more text
    [7] => part4
    [8] => :even more text
)

这是我的测试代码：

<?php
$text = '
:part1  only one \s removed:OK
:part2 :text :with
new lines
on it
:noSpaceAfterThis
:thisShoudBeAStandAlongText but: here there are more text
:part4 :even more text';

echo '<pre>';
// my effort so far:
$ret = preg_split('|\r?\n:([\w\d]+)(?:\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($ret);

// nor this one:
$ret = preg_split('|\r?\n:([\w\d]+)\r?\s?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($ret);

// for debuging, an extra capturing group
$ret = preg_split('|\r?\n:([\w\d]+)(\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
var_dump($ret);

另一种预匹配方法：
# capture all the first word at line begining preceded by a colon #
(?<=^:|\n:)       # lookbehind, preceded by the begining of the string
                  # and a colon or a newline and a colon
\S++              # all that is not a space

# capture all the content until the next line with : at first position #
(?<=\s)           # lookbehind, preceded by a space
(?:               # open a non capturing group
   [^:]+?         # all character that is not a colon, one or more times (lazy)
  |               # OR
   (?<!^|\n):     # negative lookbehind, a colon not preceded by a newline
                  # or the begining of the string
)+?               # close the non capturing group, 
                  #repeat one or more times (lazy)
(?= *+(?>\n:|$))  # lookahead, followed by spaces (zero or more) and a newline 
                  # with colon at first position or the end of the string

说明：
目标是将文本分为两种情况：

在换行符上，当第一个字符为时：
当行以开头时，在行的第一个空格处：

因此，在这条：一行开头的word周围有两个拆分点。
必须删除：
和后面的空格，但必须保留单词。这就是为什么我使用PREG_SPLIT_DELIM_CAPTURE来保存单词的原因
图案详情：
(?:           # non capturing group (all inside will be removed)
   \s*\n      # trim the spaces of the precedent line and the newline
  |           # OR
   ^          # it is the begining of the string
)             # end of the non capturing group
:             # remove the first character when it is a :
(\S++)        # keep the first word with DELIM_CAPTURE
(?: )?        # remove the first space if present

@路易西：“啊！对不起，打错了。谢谢你。我已经编辑了我的问题，分号必须在行的开头。当然，没有人会理解问题和答案；-）@杰克：我添加了一些解释。这将有助于其他人了解规则是什么，也就是说，为什么它应该符合你想要的方式？：）@杰克，你说得对，我希望你同意我所做的编辑。
$res = preg_split('~(?:\s*\n|^):(\S++)(?: )?~', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

(?:           # non capturing group (all inside will be removed)
   \s*\n      # trim the spaces of the precedent line and the newline
  |           # OR
   ^          # it is the begining of the string
)             # end of the non capturing group
:             # remove the first character when it is a :
(\S++)        # keep the first word with DELIM_CAPTURE
(?: )?        # remove the first space if present