通过PHP和REGEXP提取文本片段

通过PHP和REGEXP提取文本片段,php,regex,Php,Regex,假设我有字符串变量: $str = ' [WhiteTitle "GM"] [WhiteCountry "Cuba"] [BlackCountry "United States"] 1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6 7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6 12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. R

假设我有字符串变量:

$str = '
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]

1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
';
我想将该变量中的一些信息提取到如下数组中:

Array {
    ["WhiteTitle"] => "GM",
    ["WhiteCountry"] => "Cuba",
    ["BlackCountry"] => "United States"
}
谢谢。

您可以使用:

preg_match_all('/\[(.*?) "(.*?)"\]/m', $str, $matches, PREG_SET_ORDER);
print_r($matches);
它将为您提供数组中的所有匹配项,0键为完全匹配项,第一键为第一部分,第二键为第二部分:

Output:

Array
(
    [0] => Array
        (
            [0] => [WhiteTitle "GM"]
            [1] => WhiteTitle
            [2] => GM
        )

    [1] => Array
        (
            [0] => [WhiteCountry "Cuba"]
            [1] => WhiteCountry
            [2] => Cuba
        )

    [2] => Array
        (
            [0] => [BlackCountry "United States"]
            [1] => BlackCountry
            [2] => United States
        )
)
如果您希望它采用您要求的格式,您可以使用简单的循环:

$array = array();
foreach($matches as $match){
    $array[$match[1]] = $match[2];
}
print_r($array);

Output:

Array
(
    [WhiteTitle] => GM
    [WhiteCountry] => Cuba
    [BlackCountry] => United States
)
您可以使用:

preg_match_all('/\[(.*?) "(.*?)"\]/m', $str, $matches, PREG_SET_ORDER);
print_r($matches);
它将为您提供数组中的所有匹配项,0键为完全匹配项,第一键为第一部分,第二键为第二部分:

Output:

Array
(
    [0] => Array
        (
            [0] => [WhiteTitle "GM"]
            [1] => WhiteTitle
            [2] => GM
        )

    [1] => Array
        (
            [0] => [WhiteCountry "Cuba"]
            [1] => WhiteCountry
            [2] => Cuba
        )

    [2] => Array
        (
            [0] => [BlackCountry "United States"]
            [1] => BlackCountry
            [2] => United States
        )
)
如果您希望它采用您要求的格式,您可以使用简单的循环:

$array = array();
foreach($matches as $match){
    $array[$match[1]] = $match[2];
}
print_r($array);

Output:

Array
(
    [WhiteTitle] => GM
    [WhiteCountry] => Cuba
    [BlackCountry] => United States
)

您可以使用如下内容:

<?php
$string = <<< EOF
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]
1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
EOF;

$final = array();
preg_match_all('/\[(.*?)\s+(".*?")\]/', $string, $matches, PREG_PATTERN_ORDER);
for($i = 0; $i < count($matches[1]); $i++) {
    $final[$matches[1][$i]] = $matches[2][$i];
}

print_r($final);

视频演示:


正则表达式解释:

Array
(
    [WhiteTitle] => "GM"
    [WhiteCountry] => "Cuba"
    [BlackCountry] => "United States"
)
\[(.*?)\s+(".*?")\]

Match the character “[” literally «\[»
Match the regex below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 2 «(".*?")»
   Match the character “"” literally «"»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “"” literally «"»
Match the character “]” literally «\]»

您可以使用如下内容:

<?php
$string = <<< EOF
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]
1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
EOF;

$final = array();
preg_match_all('/\[(.*?)\s+(".*?")\]/', $string, $matches, PREG_PATTERN_ORDER);
for($i = 0; $i < count($matches[1]); $i++) {
    $final[$matches[1][$i]] = $matches[2][$i];
}

print_r($final);

视频演示:


正则表达式解释:

Array
(
    [WhiteTitle] => "GM"
    [WhiteCountry] => "Cuba"
    [BlackCountry] => "United States"
)
\[(.*?)\s+(".*?")\]

Match the character “[” literally «\[»
Match the regex below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 2 «(".*?")»
   Match the character “"” literally «"»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “"” literally «"»
Match the character “]” literally «\]»

以下是一个更安全、更紧凑的解决方案:

$re = '~\[([^]["]*?)\s*"([^]"]+)~';   // Defining the regex
$str = "[WhiteTitle \"GM\"]\n[WhiteCountry \"Cuba\"]\n[BlackCountry \"United States\"]\n\n1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6\n7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6\n12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7\n17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0"; 
preg_match_all($re, $str, $matches);  // Getting all matches
print_r(array_combine($matches[1],$matches[2])); // Creating the final array with array_combine
看,还有一个

正则表达式详细信息:

  • \[
    -打开
    [
  • ([^][“]*?)
    -第1组匹配0多个字符,而不是
    [
    ]
    ,尽可能少到
  • \s*
    -0+空格(用于修剪第一个值)
  • -双引号
  • ([^]“]+)
    -第2组匹配除
    ]

这里有一个更安全、更紧凑的解决方案:

$re = '~\[([^]["]*?)\s*"([^]"]+)~';   // Defining the regex
$str = "[WhiteTitle \"GM\"]\n[WhiteCountry \"Cuba\"]\n[BlackCountry \"United States\"]\n\n1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6\n7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6\n12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7\n17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0"; 
preg_match_all($re, $str, $matches);  // Getting all matches
print_r(array_combine($matches[1],$matches[2])); // Creating the final array with array_combine
看,还有一个

正则表达式详细信息:

  • \[
    -打开
    [
  • ([^][“]*?)
    -第1组匹配0多个字符,而不是
    [
    ]
    ,尽可能少到
  • \s*
    -0+空格(用于修剪第一个值)
  • -双引号
  • ([^]“]+)
    -第2组匹配除
    ]

您尝试过任何正则表达式吗?是的,这是可能的。请展示迄今为止的先前尝试或研究。(另一个标题为非描述性的问题不会让未来的访问者受益。)我们这里不为人们编写代码,我们帮助解决编码问题。如果我们为你写这篇文章,如果你需要修改,我们也需要为你做修改。从regex101上的
\[(.+?)\]
开始,然后从那里开始工作。它会给你描述右边发生的事情。不,实际上这超出了我对正则表达式的了解,我知道它涉及到一些类似于
preg_匹配($regex,$str,$matches)$数组=$matches[0]。但是我无法理解括号在
$regex
中的位置。什么是
$regex
你在那里写了什么吗?你尝试过任何regex吗?是的,这是可能的。请展示迄今为止的先前尝试或研究。(另一个标题为非描述性的问题不会让未来的访问者受益。)我们这里不为人们编写代码,我们帮助解决编码问题。如果我们为你写这篇文章,如果你需要修改,我们也需要为你做修改。从regex101上的
\[(.+?)\]
开始,然后从那里开始工作。它会给你描述右边发生的事情。不,实际上这超出了我对正则表达式的了解,我知道它涉及到一些类似于
preg_匹配($regex,$str,$matches)$数组=$matches[0]。但是我无法理解括号在
$regex
中的位置。什么是
$regex
你在那里写了什么吗?
数组组合
,很好,但我认为OP需要引号<代码>“GM”
array\u combine
,很好,但我认为OP需要引号<代码>“GM”