php从字符串范围中删除冗余字_Php_String

php从字符串范围中删除冗余字

php string

php从字符串范围中删除冗余字,php,string,Php,String,我有一系列的信息。例如：第1卷第3章第5页至第1卷第5章第10页删除冗余信息并将其转换为以下内容的最快方法是什么：第1卷第3章第5页至第5章第10页或者如果输入是第1卷第3章第5页至第1卷第3章第10页然后输出第1卷第3章第5页到第10页这里最难的部分是将输入分割成令牌，因为它的结构不够好。我使用了一个递归函数来顺序清理第一个重复元素的字符串。该输入正确，但我不确定是否100%正确，因为输入结构不清楚： <?php $str = 'Volume 1 Chapter 3 Pag

我有一系列的信息。例如：

第1卷第3章第5页至第1卷第5章第10页

删除冗余信息并将其转换为以下内容的最快方法是什么：

第1卷第3章第5页至第5章第10页

或者如果输入是

第1卷第3章第5页至第1卷第3章第10页然后输出

第1卷第3章第5页到第10页这里最难的部分是将输入分割成令牌，因为它的结构不够好。我使用了一个递归函数来顺序清理第一个重复元素的字符串。该输入正确，但我不确定是否100%正确，因为输入结构不清楚：

<?php
$str = 'Volume 1 Chapter 3 Page 5 TO Volume 1 Chapter 3 Page 10';
$str = clear_first_element_duplicates($str);
var_dump($str);

function clear_first_element_duplicates($str)
{
    if (preg_match('/(.*?\d)\s(.*)/', $str, $tokens))
    {
        $regexp = preg_quote($tokens[1]);
        $str = preg_replace("/$regexp\s?/", '', $tokens[2]);
        return $tokens[1]." ".clear_first_element_duplicates($str);
    }

    return $str;
}

我的剧本看起来很复杂，但值得一试：

我添加了可变级别，因此它不仅限于卷、章和页，如果需要，您可以添加例如段落行和字符，甚至可以更改措辞。见最后的例子

**请小心使用$separator参数，它必须精确（区分大小写），并且可能只在脚本上出现一次，这很容易修复，但我将重点放在函数的重要部分**

function redundancy($string, $separator){
    list($a, $b) = explode($separator, $string);

    //getting the numeric values of both sides
    $pattern = '/[0-9]+/';
    preg_match_all($pattern, $a, $a_values);
    preg_match_all($pattern, $b, $b_values);

    $a_values = $a_values[0];
    $b_values = $b_values[0];

    //getting the wording and cleaning out the numbers, I guess this can be improved through a better REGEX
    preg_match_all('/\b\w+\b/', $a, $matches);
    foreach($matches[0] as $match){
        if(!is_numeric($match)) $words[] = $match;
    }

    //algorithm
    $length = count($a_values) - 1; // excluding the last element, to be checked separately
    $output = $a.$separator." ";
    $same_full_path = true; // check if the levels has been altered to check the last element
    $same_parent = true; // check the previous level
    for($i = 0; $i < $length; $i++){
        if($a_values[$i] !== $b_values[$i] || $same_parent === false){
            $same_parent = false;
            $same_full_path = false;
            $output .= $words[$i]." ".$b_values[$i]." ";
        }
    }

    //adding the word to the last element or not, The last element check must be outside the loop because it's special;
    if($same_full_path === false || end($a_values) === end($b_values)) $output .= end($words)." ";
    $output .= end($b_values);

    echo "$string <Br/> $output; <br/><br/> ";
}

redundancy('Volume 1 Chapter 3 Page 5 TO Volume 1 Chapter 5 Page 10', 'TO');
redundancy('Serie 1 Season 2 Chapter 2 Minute 5 Second 6 Until Serie 1 Season 3 Chapter 4 Minute 3 Second 1', 'Until');
redundancy('District 4 Building 2 Floor 4 Door 5 To District 4 Building 2 Floor 4 Door 8', 'To');

函数冗余（$string，$separator）{
列表（$a，$b）=分解（$separator，$string）；
//获取两边的数值
$pattern='/[0-9]+/'；
preg_match_all（$pattern，$a，$a_值）；
preg_match_all（$pattern，$b，$b_值）；
$a_值=$a_值[0]；
$b_值=$b_值[0]；
//我想这可以通过更好的正则表达式来改进
preg_match_all（'/\b\w+\b/'，$a，$matches）；
foreach（$matches[0]为$match）{
如果（！is_numeric（$match））$words[]=$match；
}
//算法
$length=count（$a_值）-1；//不包括要单独检查的最后一个元素
$output=$a.$separator。“”；
$same\u full\u path=true；//检查级别是否已更改以检查最后一个元素
$same_parent=true；//检查上一级
对于（$i=0；$i<$length；$i++）{
if（$a_值[$i]！=$b_值[$i]| |$same_父项===false）{
$same_parent=false；
$same\u full\u path=false；
$output.=$words[$i]。$b_值[$i]。“”；
}
}
//无论是否将单词添加到最后一个元素，最后一个元素检查必须在循环之外，因为它是特殊的；
if（$same_full_path===false | | end（$a_值）===end（$b_值））$output.=end（$words）。“”；
$output.=结束（$b_值）；
回显“$string
$output；

”；
}
冗余（“第1卷第3章第5页至第1卷第5章第10页”，“至”）；
冗余（“意甲第二季第2章第2分5秒6至意甲第三季第4分3秒1至”）；
冗余（“4区大楼2楼4门5至4区大楼2楼4门8”，“至”）；

产出：

第1卷第3章第5页至第1卷第5章第10页

第1卷第3章第5页至第5章第10页

意甲联赛第二季第2章第2分5秒6到意甲联赛第三季第4章第3分1秒

意甲第二季第2章第2分5秒6至第三季第4章第3分1秒

4区大楼2楼4门5至4区大楼2楼4门8

4区2楼4号楼5至8号门

删除冗余信息

——为什么不将其转换为

第1卷第3章第5页至第10页

？如果数据是

第1卷第3章第5页至第1卷第5章第10页

？结果是什么？是的，为什么你认为“页面”在页面上不是多余的？裁员的标准是什么？除了数字之外，措辞还会改变吗？我认为我们需要一个更复杂的解决方案，我建议得到数值并进行数学计算以得到正确的最终结果wording@devnull正确的。应该是第5章$str=‘第1卷第3章第5页至第2卷第3章第10页’；将忽略第2卷上的章节，它将输出“第2卷第10页”

function redundancy($string, $separator){
    list($a, $b) = explode($separator, $string);

    //getting the numeric values of both sides
    $pattern = '/[0-9]+/';
    preg_match_all($pattern, $a, $a_values);
    preg_match_all($pattern, $b, $b_values);

    $a_values = $a_values[0];
    $b_values = $b_values[0];

    //getting the wording and cleaning out the numbers, I guess this can be improved through a better REGEX
    preg_match_all('/\b\w+\b/', $a, $matches);
    foreach($matches[0] as $match){
        if(!is_numeric($match)) $words[] = $match;
    }

    //algorithm
    $length = count($a_values) - 1; // excluding the last element, to be checked separately
    $output = $a.$separator." ";
    $same_full_path = true; // check if the levels has been altered to check the last element
    $same_parent = true; // check the previous level
    for($i = 0; $i < $length; $i++){
        if($a_values[$i] !== $b_values[$i] || $same_parent === false){
            $same_parent = false;
            $same_full_path = false;
            $output .= $words[$i]." ".$b_values[$i]." ";
        }
    }

    //adding the word to the last element or not, The last element check must be outside the loop because it's special;
    if($same_full_path === false || end($a_values) === end($b_values)) $output .= end($words)." ";
    $output .= end($b_values);

    echo "$string <Br/> $output; <br/><br/> ";
}

redundancy('Volume 1 Chapter 3 Page 5 TO Volume 1 Chapter 5 Page 10', 'TO');
redundancy('Serie 1 Season 2 Chapter 2 Minute 5 Second 6 Until Serie 1 Season 3 Chapter 4 Minute 3 Second 1', 'Until');
redundancy('District 4 Building 2 Floor 4 Door 5 To District 4 Building 2 Floor 4 Door 8', 'To');