用于部分提取php代码的正则表达式((数组定义))

用于部分提取php代码的正则表达式((数组定义)),php,regex,arrays,Php,Regex,Arrays,我将php代码存储在如下字符串中((数组定义)) $code=' array( 0 => "a", "a" => $GlobalScopeVar, "b" => array("nested"=>array(1,2,3)), "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }, ); '; 有一个正则表达式来提取这个数组???,我

我将php代码存储在如下字符串中((数组定义))

$code=' array(

  0  => "a",
 "a" => $GlobalScopeVar,
 "b" => array("nested"=>array(1,2,3)),  
 "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },

); ';
有一个正则表达式来提取这个数组???,我的意思是我想要

$array=(  

  0  => '"a"',
 'a' => '$GlobalScopeVar',
 'b' => 'array("nested"=>array(1,2,3))',
 'c' => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',

);

pD::我做研究试图找到正则表达式,但什么也没找到。
pD2::stackoverflow之神,让我现在赏金,我将提供400:3

pD3::这将在一个内部应用程序中使用,我需要提取一些php文件的数组以进行部分“处理”,我尝试用以下内容解释这种情况:

$code=' array(

  0=>"a",
  "a"=>$GlobalScopeVar,
  "b"=>array("nested"=>array(1,2,3)),  
  "c"=>function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },

); ';

preg_match_all('#\s*(.*?)\s*=>\s*(.*?)\s*,?\s*$#m', $code, $m);
$array = array_combine($m[1], $m[2]);
print_r($array);
输出:

Array
(
    [0] => "a"
    ["a"] => $GlobalScopeVar
    ["b"] => array("nested"=>array(1,2,3))
    ["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
)
正则表达式 下面是我想出的MEGA正则表达式:

\s*                                     # white spaces
########################## KEYS START ##########################
(?:                                     # We\'ll use this to make keys optional
(?P<keys>                               # named group: keys
\d+                                     # match digits
|                                       # or
"(?(?=\\\\")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello \" world"
|                                       # or
\'(?(?=\\\\\')..|[^\'])*\'              # match string between \'\', same as above :p
|                                       # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
)                                       # close group: keys
########################## KEYS END ##########################
\s*                                     # white spaces
=>                                      # match =>
)?                                      # make keys optional
\s*                                     # white spaces
########################## VALUES START ##########################
(?P<values>                             # named group: values
\d+                                     # match digits
|                                       # or
"(?(?=\\\\")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello \" world"
|                                       # or
\'(?(?=\\\\\')..|[^\'])*\'              # match string between \'\', same as above :p
|                                       # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
|                                       # or
array\s*\((?:[^()]|(?R))*\)             # match an array()
|                                       # or
\[(?:[^[\]]|(?R))*\]                    # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
|                                       # or
(?:function\s+)?\w+\s*                  # match functions: helloWorld, function name
(?:\((?:[^()]|(?R))*\))                 # match function parameters (wut), (), (array(1,2,4))
(?:(?:\s*use\s*\((?:[^()]|(?R))*\)\s*)? # match use(&$var), use($foo, $bar) (optionally)
\{(?:[^{}]|(?R))*\}                     # match { whatever}
)?;?                                    # match ; (optionally)
)                                       # close group: values
########################## VALUES END ##########################
\s*                                     # white spaces
输出

已知错误(已修复)

信用 转到以使其匹配嵌套括号

忠告
您可能应该使用解析器,因为正则表达式是敏感的。在他的工作中做得很好。

即使您要求使用正则表达式,它也适用于纯PHP。这里是关键功能。要查看正则表达式

这里的优点是它比正则表达式更具动态性。正则表达式有一个静态模式,而使用token_get_all,您可以在每个token之后决定要做什么。它甚至可以在必要的地方避开单引号和反斜杠,这是正则表达式所不能做到的

此外,在regex中,即使在被评论时,您也无法想象它应该做什么;当您查看PHP代码时,代码的作用更容易理解

$code = ' array(

  0  => "a",
  "a" => $GlobalScopeVar,
  "b" => array("nested"=>array(1,2,3)),  
  "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
  "string_literal",
  12345

); ';

$token = token_get_all("<?php ".$code);
$newcode = "";

$i = 0;
while (++$i < count($token)) { // enter into array; then start.
        if (is_array($token[$i]))
                $newcode .= $token[$i][1];
        else
                $newcode .= $token[$i];

        if ($token[$i] == "(") {
                $ending = ")";
                break;
        }
        if ($token[$i] == "[") {
                $ending = "]";
                break;
        }
}

// init variables
$escape = 0;
$wait_for_non_whitespace = 0;
$parenthesis_count = 0;
$entry = "";

// main loop
while (++$i < count($token)) {
        // don't match commas in func($a, $b)
        if ($token[$i] == "(" || $token[$i] == "{") // ( -> normal parenthesis; { -> closures
                $parenthesis_count++;
        if ($token[$i] == ")" || $token[$i] == "}")
                $parenthesis_count--;

        // begin new string after T_DOUBLE_ARROW
        if (!$escape && $wait_for_non_whitespace && (!is_array($token[$i]) || $token[$i][0] != T_WHITESPACE)) {
                $escape = 1;
                $wait_for_non_whitespace = 0;
                $entry .= "'";
        }

        // here is a T_DOUBLE_ARROW, there will be a string after this
        if (is_array($token[$i]) && $token[$i][0] == T_DOUBLE_ARROW && !$escape) {
                $wait_for_non_whitespace = 1;
        }

        // entry ended: comma reached
        if (!$parenthesis_count && $token[$i] == "," || ($parenthesis_count == -1 && $token[$i] == ")" && $ending == ")") || ($ending == "]" && $token[$i] == "]")) {
                // go back to the first non-whitespace
                $whitespaces = "";
                if ($parenthesis_count == -1 || ($ending == "]" && $token[$i] == "]")) {
                        $cut_at = strlen($entry);
                        while ($cut_at && ord($entry[--$cut_at]) <= 0x20); // 0x20 == " "
                        $whitespaces = substr($entry, $cut_at + 1, strlen($entry));
                        $entry = substr($entry, 0, $cut_at + 1);
                }

                // $escape == true means: there was somewhere a T_DOUBLE_ARROW
                if ($escape) {
                        $escape = 0;
                        $newcode .= $entry."'";
                } else {
                        $newcode .= "'".addcslashes($entry, "'\\")."'";
                }

                $newcode .= $whitespaces.($parenthesis_count?")":(($ending == "]" && $token[$i] == "]")?"]":","));

                // reset
                $entry = "";
        } else {
                // add actual token to $entry
                if (is_array($token[$i])) {
                        $addChar = $token[$i][1];
                } else {
                        $addChar = $token[$i];
                }

                if ($entry == "" && $token[$i][0] == T_WHITESPACE) {
                        $newcode .= $addChar;
                } else {
                        $entry .= $escape?str_replace(array("'", "\\"), array("\\'", "\\\\"), $addChar):$addChar;
                }
        }
}

//append remaining chars like whitespaces or ;
$newcode .= $entry;

print $newcode;

要获取数组的数据,您可以,
print\r(eval(“return$newcode;”)
获取数组的条目:

Array
(
    [0] => "a"
    [a] => $GlobalScopeVar
    [b] => array("nested"=>array(1,2,3))
    [c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [1] => "string_literal"
    [2] => 12345
)

这是你要的,非常紧凑。 请让我知道,如果你想任何调整

代码(可以直接在php中运行)

$array包含您的数组

你喜欢什么


如果您有任何问题或需要调整,请告诉我。:)

实现这一点的干净方法显然是使用标记器(但请记住,单独使用标记器并不能解决问题)

为了迎接挑战,我采用了正则表达式方法

我们的想法不是描述PHP语法,而是以一种否定的方式来描述它(换句话说,我只描述获得结果所需的基本PHP结构)。这种基本描述的优点是处理比函数、字符串、整数或布尔更复杂的对象。结果是一个更灵活的模式,可以处理多行/单行注释、herdoc/nowdoc语法:



,但你没有从我这里听到
$array=eval('return'.$code)
True,我刚才注意到了额外的单引号。如果是那样的话,你可能得带上大炮。但我必须问——你不能用另一种方式吗?@iim.hlk我可能正在准备一个惊喜:-),我还准备了一些代码(应该是“神奇防水”)。它甚至可以在@HamZa的mega regex失败的地方工作。事实上,目前我正在考虑提供多个赏金,因为你的答案和@HamZa的答案都非常好:2为什么你说“就为了这种情况”?,这个regex无法提取任何类型的正确结构的数组???@iim.hlk它会失败。基本上每个“元素”都需要在一条新的线上才能成功。天哪!,听起来很脏,但我可以提供400我的代表,以获得神奇的防水无敌regex。。。想想看,有了+6000的rep,你会抓住所有的女士((局部是一个JB)):$@iim.hlk抱歉,但regex不知道“防水”,你需要一个解析器。@bwoebi它没有被解析,我提供了一个扩展的示例。我对它进行了彻底的测试,它应该可以工作,如果发现一个bug,请毫不犹豫地报告。@bwoebi故意或无意,regex是关于匹配某些模式的。如果我没有将其编写为匹配
${“variable”}
(例如),那么它将不匹配。基本上,我尽了最大努力匹配所有可能的情况,但由于我不是一个高级php程序员,我忘记了一些情况(我甚至不知道)。@HamZa检查所有可能的语言:@HamZa你的努力至少值得250,我知道这个正则表达式可以改进,但目前只适用于许多情况。((我也会给bwoebi 250)嘿!!,我喜欢这里的魔力,我不在乎这是否是一个正则表达式解决方案,因为它确实有效,事实上,我喜欢你超越所需解决方案((正则表达式))的方式,并给出我所需要的。。。我测试了它,显然没有bug*,所以,恭喜!!你为这段美丽的代码奖励+250。((是的,你应该等,因为我等了23个小时才给Hamza加250分))@iim.hlk不,magic,正在处理token_get_all()返回的数组-P我会等待赏金;-)我想开始一个新的250磅的“支付给你什么是你的”,但只给了我500分的选择:c@iim.hlk你不知道吗,如果你在同一个问题上再次悬赏,你必须加倍悬赏?@iim.hlk现在:你会怎么做?赏金(我真的写了这个代码来获得赏金:|)还是不赏金?嘿,兄弟!我直到现在才看到这个答案,看起来真的很好,事实上我一直喜欢压缩代码((用几行代码做更多)),目前我无法测试这个,但我真的很感谢你的努力,谢谢:)嘿,兄弟,我对这个解决方案有点小问题,只是需要一点修正,我想,请检查一下:,也许空($v)有问题['stop']]有条件的,谢谢你的时间。
    $code='array("aaa", "sdsd" => "dsdsd");'; // fail
    $code='array(\'aaa\', \'sdsd\' => "dsdsd");'; // fail
    $code='array("aaa", \'sdsd\' => "dsdsd");'; // succeed
    // Which means, if a value with no keys is followed
    // by key => value and they are using the same quotation
    // then it will fail (first value gets merged with the key)
$code = ' array(

  0  => "a",
  "a" => $GlobalScopeVar,
  "b" => array("nested"=>array(1,2,3)),  
  "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
  "string_literal",
  12345

); ';

$token = token_get_all("<?php ".$code);
$newcode = "";

$i = 0;
while (++$i < count($token)) { // enter into array; then start.
        if (is_array($token[$i]))
                $newcode .= $token[$i][1];
        else
                $newcode .= $token[$i];

        if ($token[$i] == "(") {
                $ending = ")";
                break;
        }
        if ($token[$i] == "[") {
                $ending = "]";
                break;
        }
}

// init variables
$escape = 0;
$wait_for_non_whitespace = 0;
$parenthesis_count = 0;
$entry = "";

// main loop
while (++$i < count($token)) {
        // don't match commas in func($a, $b)
        if ($token[$i] == "(" || $token[$i] == "{") // ( -> normal parenthesis; { -> closures
                $parenthesis_count++;
        if ($token[$i] == ")" || $token[$i] == "}")
                $parenthesis_count--;

        // begin new string after T_DOUBLE_ARROW
        if (!$escape && $wait_for_non_whitespace && (!is_array($token[$i]) || $token[$i][0] != T_WHITESPACE)) {
                $escape = 1;
                $wait_for_non_whitespace = 0;
                $entry .= "'";
        }

        // here is a T_DOUBLE_ARROW, there will be a string after this
        if (is_array($token[$i]) && $token[$i][0] == T_DOUBLE_ARROW && !$escape) {
                $wait_for_non_whitespace = 1;
        }

        // entry ended: comma reached
        if (!$parenthesis_count && $token[$i] == "," || ($parenthesis_count == -1 && $token[$i] == ")" && $ending == ")") || ($ending == "]" && $token[$i] == "]")) {
                // go back to the first non-whitespace
                $whitespaces = "";
                if ($parenthesis_count == -1 || ($ending == "]" && $token[$i] == "]")) {
                        $cut_at = strlen($entry);
                        while ($cut_at && ord($entry[--$cut_at]) <= 0x20); // 0x20 == " "
                        $whitespaces = substr($entry, $cut_at + 1, strlen($entry));
                        $entry = substr($entry, 0, $cut_at + 1);
                }

                // $escape == true means: there was somewhere a T_DOUBLE_ARROW
                if ($escape) {
                        $escape = 0;
                        $newcode .= $entry."'";
                } else {
                        $newcode .= "'".addcslashes($entry, "'\\")."'";
                }

                $newcode .= $whitespaces.($parenthesis_count?")":(($ending == "]" && $token[$i] == "]")?"]":","));

                // reset
                $entry = "";
        } else {
                // add actual token to $entry
                if (is_array($token[$i])) {
                        $addChar = $token[$i][1];
                } else {
                        $addChar = $token[$i];
                }

                if ($entry == "" && $token[$i][0] == T_WHITESPACE) {
                        $newcode .= $addChar;
                } else {
                        $entry .= $escape?str_replace(array("'", "\\"), array("\\'", "\\\\"), $addChar):$addChar;
                }
        }
}

//append remaining chars like whitespaces or ;
$newcode .= $entry;

print $newcode;
array(

  0  => '"a"',
  "a" => '$GlobalScopeVar',
  "b" => 'array("nested"=>array(1,2,3))',  
  "c" => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
  '"string_literal"',
  '12345'

) 
Array
(
    [0] => "a"
    [a] => $GlobalScopeVar
    [b] => array("nested"=>array(1,2,3))
    [c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [1] => "string_literal"
    [2] => 12345
)
$code=' array(
  0  => "a",
 "a" => $GlobalScopeVar,
 "b" => array("nested"=>array(1,2,3)),  
 "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },

); ';

$regex = "~(?xm)
^[\s'\"]*([^'\"\s]+)['\"\s]*
=>\s*+
(.*?)\s*,?\s*$~";

if(preg_match_all($regex,$code,$matches,PREG_SET_ORDER)) {
    $array=array();
    foreach($matches as $match) {
        $array[$match[1]] = $match[2];
    }

    echo "<pre>";
    print_r($array);
    echo "</pre>";

} // END IF
Array
(
    [0] => "a"
    [a] => $GlobalScopeVar
    [b] => array("nested"=>array(1,2,3))
    [c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
)
<pre><?php

$code=' array(
  0   => "a",
  "a" => $GlobalScopeVar,
  "b" => array("nested"=>array(1,2,3)),  
  "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';

$pattern = <<<'EOD'
~
# elements
(?(DEFINE)
    # comments
    (?<comMulti> /\* .*? (?:\*/|\z) )                                              # multiline comment
    (?<comInlin> (?://|\#) \N* $ )                                                 # inline comment
    (?<comments> \g<comMulti> | \g<comInlin> )

    # strings
    (?<strDQ> " (?>[^"\\]+|\\.)* ")                                                # double quote string
    (?<strSQ> ' (?>[^'\\]+|\\.)* ')                                                # single quote string
    (?<strHND> <<<(["']?)([a-zA-Z]\w*)\g{-2} (?>\R \N*)*? \R \g{-1} ;? (?=\R|$) )  # heredoc and nowdoc syntax
    (?<string> \g<strDQ> | \g<strSQ> | \g<strHND> )

    # brackets
    (?<braCrl> { (?> \g<nobracket> | \g<brackets> )* } )
    (?<braRnd> \( (?> \g<nobracket> | \g<brackets> )* \) )
    (?<braSqr> \[ (?> \g<nobracket> | \g<brackets> )* ] )
    (?<brackets> \g<braCrl> | \g<braRnd> | \g<braSqr> )

    # nobracket: content between brackets except other brackets
    (?<nobracket> (?> [^][)(}{"'</\#]+ | \g<comments> | / | \g<string> | <+ )+ )

    # ignored elements
    (?<s> \s+ | \g<comments> )
)

# array components
(?(DEFINE)    
    # key
    (?<key> [0-9]+ | \g<string> )

    # value
    (?<value> (?> [^][)(}{"'</\#,\s]+ | \g<s> | / | \g<string> | <+ | \g<brackets> )+? (?=\g<s>*[,)]) )
)
(?J)
(?: \G (?!\A)(?<!\)) | array \g<s>* \( ) \g<s>* \K

    (?: (?<key> \g<key> ) \g<s>* => \g<s>* )? (?<value> \g<value> ) \g<s>* (?:,|,?\g<s>*(?<stop> \) ))
~xsm
EOD;


if (preg_match_all($pattern, $code, $m, PREG_SET_ORDER)) {
    foreach($m as $v) {
        echo "\n<strong>Whole match:</strong> " . $v[0]
           . "\n<strong>Key</strong>:\t" . $v['key']
           . "\n<strong>Value</strong>:\t" . $v['value'] . "\n";
        if (isset($v['stop']))
            echo "\n<strong>done</strong>\n\n"; 

    }
}