Php 以识别自定义if语句的方式解析纯文本_Php_Regex_Arrays

Php 以识别自定义if语句的方式解析纯文本

php regex arrays

Php 以识别自定义if语句的方式解析纯文本,php,regex,arrays,Php,Regex,Arrays,我有以下字符串： $string = "The man has {NUM_DOGS} dogs." 我通过以下函数对其进行解析： function parse_text($string) { global $num_dogs; $string = str_replace('{NUM_DOGS}', $num_dogs, $string); return $string; } parse_text($string); 其中，$num_dogs是一个预设变量。根据$

我有以下字符串：

$string = "The man has {NUM_DOGS} dogs."

我通过以下函数对其进行解析：

function parse_text($string)
{
    global $num_dogs;

    $string = str_replace('{NUM_DOGS}', $num_dogs, $string);

    return $string;
}

parse_text($string);

其中，

$num_dogs

是一个预设变量。根据

$num_dogs

，这可能会返回以下任何字符串：

这个人有一条狗
这个人有两只狗
这个人有500条狗

问题是，在“人有一条狗”的情况下，狗是多元的，这是不受欢迎的。我知道这可以通过不使用

parse_text

函数，而是执行以下操作来解决：

if($num_dogs = 1){
    $string = "The man has 1 dog.";
}else{
    $string = "The man has $num_dogs dogs.";
}

array (
  0 => array (
    0 => 'A string {TOK_ONE}',
    1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}',
  ),
  1 => array (
    0 => 'TOK_ONE',
    1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"',
  ),
)

但是在我的应用程序中，我解析的不仅仅是

{NUM_DOGS}

，而且需要很多行来编写所有条件

我需要一种速记方法，我可以将其写入初始的

$string

，我可以通过解析器运行该方法，理想情况下不会将我限制为只有两种真/假可能性

例如，让

$string = 'The man has {NUM_DOGS} [{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"].';

最后发生了什么清楚吗？我尝试使用位于垂直条后面的方括号内的部分创建一个数组，然后将新数组的键与{NUM_DOGS}的解析值（现在将是垂直条左侧的$NUM_DOGS变量）进行比较，并返回带有该键的数组项的值

如果这还不完全令人困惑，那么是否可以使用preg_*函数呢？

首先，这有点争议，但是如果你可以很容易地避免它，只需将

$num_dogs

作为参数传递给函数，因为大多数人认为全局变量是邪恶的

接下来，为了得到“s”，我通常会这样做：

$dogs_plural = ($num_dogs == 1) ? '' : 's';

$your_string = "The man has $num_dogs dog$dogs_plural";

那么就这样做：

$dogs_plural = ($num_dogs == 1) ? '' : 's';

$your_string = "The man has $num_dogs dog$dogs_plural";

这与执行if/else块基本相同，但代码行更少，只需编写一次文本

至于另一部分，我仍然对你试图做什么感到困惑，但我相信你正在寻找某种方式来转化

{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"]

进入：

最简单的方法是尝试使用

explode（）的组合

和regex，然后让它执行上面的操作。

您的问题的前提是要匹配一个特定的模式，然后在对匹配的文本执行附加处理后替换它
似乎是一个理想的候选人
用于捕获匹配的括号、引号、大括号等的正则表达式可能会变得非常复杂，而使用正则表达式来完成这一切实际上效率很低。事实上，如果您需要的话，您需要编写一个合适的解析器
对于这个问题，我将假设一个有限的复杂性级别，并使用regex进行两阶段解析

首先，我能想到的最简单的正则表达式是在花括号之间捕获标记

/{([^}]+)}/
让我们把它分解一下

{ # A literal opening brace ( # Begin capture [^}]+ # Everything that's not a closing brace (one or more times) ) # End capture } # Literal closing brace
当应用于带有
preg\u match\u all
的字符串时，结果如下所示：

if($num_dogs = 1){ $string = "The man has 1 dog."; }else{ $string = "The man has $num_dogs dogs."; }

array ( 0 => array ( 0 => 'A string {TOK_ONE}', 1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}', ), 1 => array ( 0 => 'TOK_ONE', 1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"', ), )
到目前为止看起来不错
请注意，如果字符串中有嵌套的大括号，即
{TOK|u TWO | 0=>“hi{x}y”}
，此正则表达式将不起作用。如果这不是问题，请跳到下一节
可以进行顶级匹配，但我唯一能做的就是通过递归。大多数正则表达式老手会告诉你，一旦你给正则表达式添加递归，它就不再是正则表达式了
这就是额外处理复杂性的原因，对于长而复杂的字符串，很容易耗尽堆栈空间并使程序崩溃。如果你需要使用它，就要小心使用
递归正则表达式，并稍加修改

`/{((?:[^{}]*|(?R))*)}/`
崩溃了

{ # literal brace ( # begin capture (?: # don't create another capture set [^{}]* # everything not a brace |(?R) # OR recurse )* # none or more times ) # end capture } # literal brace

(\d)+ # Capture one or more decimal digits \s* # Any amount of whitespace (allows you to do 0 => "") => # Literal pointy arrow \s* # Any amount of whitespace " # Literal quote ([^"]*) # Capture anything that isn't a quote " # Literal quote ,? # Maybe followed by a comma
这次输出只匹配顶级大括号

array ( 0 => array ( 0 => '{TOK_ONE|0=>"a {nested} brace"}', ), 1 => array ( 0 => 'TOK_ONE|0=>"a {nested} brace"', ), )
同样，除非必须，否则不要使用递归正则表达式。（如果您的系统有旧的PCRE库，则可能甚至不支持它们）

有了这些，我们需要确定令牌是否有与其相关联的选项。我建议按照我的示例保留带有标记的选项，而不是按照您的问题匹配两个片段<代码>{TOKEN | 0=>“option”}
假设
$match
包含一个匹配的标记，如果我们检查管道
|
，然后获取所有内容的子字符串，我们将留下选项列表，同样我们可以使用正则表达式解析它们。（别担心，最后我会把一切都安排好的）

/（\d）+\s*=>\s*“（[^”]*）”，？/
崩溃了

{ # literal brace ( # begin capture (?: # don't create another capture set [^{}]* # everything not a brace |(?R) # OR recurse )* # none or more times ) # end capture } # literal brace

(\d)+ # Capture one or more decimal digits \s* # Any amount of whitespace (allows you to do 0 => "") => # Literal pointy arrow \s* # Any amount of whitespace " # Literal quote ([^"]*) # Capture anything that isn't a quote " # Literal quote ,? # Maybe followed by a comma
还有一个例子

array ( 0 => array ( 0 => '0=>"no",', 1 => '1 => "one",', 2 => '2=>"two"', ), 1 => array ( 0 => '0', 1 => '1', 2 => '2', ), 2 => array ( 0 => 'no', 1 => 'one', 2 => 'two', ), )
如果您想在引号中使用引号，您必须为其创建自己的递归正则表达式

最后，这里是一个工作示例
一些初始化代码

$options = array( 'WERE' => 1, 'TYPE' => 'cat', 'PLURAL' => 1, 'NAME' => 2 ); $string = 'There {WERE|0=>"was a",1=>"were"} ' . '{TYPE}{PLURAL|1=>"s"} named bob' . '{NAME|1=>" and bib",2=>" and alice"}';
一切都在一起

$string = preg_replace_callback('/{([^}]+)}/', function($match) use ($options) { $match = $match[1]; if (false !== $pipe = strpos($match, '|')) { $tokens = substr($match, $pipe + 1); $match = substr($match, 0, $pipe); } else { $tokens = array(); } if (isset($options[$match])) { if ($tokens) { preg_match_all('/(\d)+\s*=>\s*"([^"]*)",?/', $tokens, $tokens); $tokens = array_combine($tokens[1], $tokens[2]); return $tokens[$options[$match]]; } return $options[$match]; } return ''; }, $string);
请注意，错误检查是最小的，如果选择不存在的选项，将出现意外结果

可能有很多更简单的方法来完成所有这些，但我只是接受了这个想法，并用它运行。
在紧要关头，我用一个类似于下面代码的实现做了一些类似于您所要求的事情
这远不如@Mike的答案功能丰富，但它在过去已经做到了这一点

/** * This function pluralizes words, as appropriate. * * It is a completely naive, example-only implementation. * There are existing "inflector" implementations that do this * quite well for many/most *English* words. */ function pluralize($count, $word) { if ($count === 1) { return $word; } return $word . 's'; } /** * Matches template patterns in the following forms: * {NAME} - Replaces {NAME} with value from $values['NAME'] * {NAME:word} - Replaces {NAME:word} with 'word', pluralized using the pluralize() function above. */ function parse($template, array $values) { $callback = function ($matches) use ($values) { $number = $values[$matches['name']]; if (array_key_exists('word', $matches)) { return pluralize($number, $matches['word']); } return $number; }; $pattern = '/\{(?<name>.+?)(:(?<word>.+?))?\}/i'; return preg_replace_callback($pattern, $callback, $template); }
输出为：
这个人有两只狗
这个人有一条狗
值得一提的是，在更大的项目中，我总是放弃任何定制的滚动拐点，而一旦要求使用多种语言，这似乎是最明智的方法。
这是从flussence在2009年发布的回复中抄袭的：
您可能想看看。更具体地说，它听起来像是
ngettext（）
可以做您想做的事情：只要您有一个数字可以数数，它就可以正确地使单词多元化

print ngettext('odor', 'odors', 1); // prints "odor" print ngettext('odor', 'odors', 4); // prints "odors" print ngettext('%d cat', '%d cats', 4); // prints "4 cats"
您还可以使其正确处理翻译后的复数形式，即