Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/245.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 使用正则表达式查找匹配组大于/小于的项目_Php_Regex_Pcre_Regex Negation_Regex Lookarounds - Fatal编程技术网

Php 使用正则表达式查找匹配组大于/小于的项目

Php 使用正则表达式查找匹配组大于/小于的项目,php,regex,pcre,regex-negation,regex-lookarounds,Php,Regex,Pcre,Regex Negation,Regex Lookarounds,鉴于以下案文: <p style="color: blue">Some text</p> <p style="color:blue; margin-left: 10px">* Item 1</p> // Should match <p style="margin-left: 10px">* Item 2</p> <p style="margin-left: 20px">* Sub Item 1a</p>

鉴于以下案文:

<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p> // Should match
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p> // Should match
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p> // Should match
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p> // Should match
<p>Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p> // Should match
我不能使用
DomDocument
,因为我收到的标记并不总是有效的标记(通常来自Microsoft Office>HTML转换),所以我使用正则表达式来解决这个问题

我现在的正则表达式是:

(?!<p style=".*?(margin-left:\s?(?!\k'margin')px;).*?">\* .*?<\/p>)<p style="(?P<styles>.*?)margin-left:\s?(?P<margin>[0-9]{1,3})px;?">\* (?P<listcontent>.*)<\/p>
(?!

\*(?p.*)

但这仅基于前面元素的存在进行匹配,这些元素是一个
p
,左边距为

如何将匹配的
左边距
组中的系数计算在内,并返回大于上一个匹配的值


我创建了一个示例来演示这个问题,其中包含示例数据和当前输出。

此代码按照预期工作,使用正则表达式获取每个元素,然后循环遍历它们并检查业务逻辑:

<?php

$data = '<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p>
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p>
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p>
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p>
<div>Some text</div>
<p style="color:blue; margin-left: 10px">* Item 1</p>';

// Get all HTML tags, the element in [1], the attributes (style etc) in [2], the content in [3]
preg_match_all("/<(\w+)\b([^>]+)*>(.*?)<\/\w+>/", $data, $matches);

$results = [];

// Keep track of last element margin-left, if it's is missing it will be set to 0 making the next
// element included automatically if it has a margin-left
$lastMarginLeft = 0;

// Loop through matches and apply business rules
for ($i = 0; $i <= count($matches[0]); $i++) {
    /**
     * Business rules:
     * - Contents begins with an asterisk character
     * - Elements have a margin-left inline style
     * - The preceding content is either:
     *   - A p element which has no margin-left
     *   - A p element with a margin-left which is lower than the matched element
     *   - Any other element
     */

    // Assume no margin-left found by default
    $marginLeft = 0;

    // Check element has a margin-left
    if (strpos($matches[2][$i], 'margin-left') !== false) {
        // Extract margin-left value
        preg_match("/margin-left:\s?(\d+)/", $matches[2][$i], $value);
        $marginLeft = isset($value[1]) ? $value[1] : 0;

        // Check if this margin is greater than the last
        if ($marginLeft > $lastMarginLeft) {
            // Check content
            if (strpos($matches[3][$i], '*') === 0) {
                $results[] = $matches[0][$i];
            }
        }
    }

    // Capture margin left for next run
    $lastMarginLeft = $marginLeft;
}

// Results:
// Array
// (
//     [0] => <p style="color:blue; margin-left: 10px">* Item 1</p>
//     [1] => <p style="margin-left: 20px">* Sub Item 1a</p>
//     [2] => <p style="margin-left: 20px">* Sub Item 1b</p>
//     [3] => <p style="margin-left: 30px">* Sub Item 1c</p>
//     [4] => <p style="color:blue; margin-left: 10px">* Item 1</p>
// )

这是否必须在一个过程/方法中完成?你能匹配所有的标签然后使用PHP来减少集合吗?因为这是一系列操作的一部分,如果可能的话,它需要作为一个正则表达式来完成。我不相信这在一次传递中是可能的,因为你需要将值与其他值进行比较。您将能够找到
p
元素,其中包含
左边距
,但您需要一个辅助过程来进行比较。如果无法将其作为一个正则表达式来进行比较,并且您有一种方法可以使用两个过程来解决它,请将其作为答案写出来。我将写一个答案,一个问题:最后一个元素如何匹配?它前面有一个p元素,该元素没有剩余的余量,因此它没有通过测试“p元素以外的任何元素”?
<?php

$data = '<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p>
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p>
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p>
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p>
<div>Some text</div>
<p style="color:blue; margin-left: 10px">* Item 1</p>';

// Get all HTML tags, the element in [1], the attributes (style etc) in [2], the content in [3]
preg_match_all("/<(\w+)\b([^>]+)*>(.*?)<\/\w+>/", $data, $matches);

$results = [];

// Keep track of last element margin-left, if it's is missing it will be set to 0 making the next
// element included automatically if it has a margin-left
$lastMarginLeft = 0;

// Loop through matches and apply business rules
for ($i = 0; $i <= count($matches[0]); $i++) {
    /**
     * Business rules:
     * - Contents begins with an asterisk character
     * - Elements have a margin-left inline style
     * - The preceding content is either:
     *   - A p element which has no margin-left
     *   - A p element with a margin-left which is lower than the matched element
     *   - Any other element
     */

    // Assume no margin-left found by default
    $marginLeft = 0;

    // Check element has a margin-left
    if (strpos($matches[2][$i], 'margin-left') !== false) {
        // Extract margin-left value
        preg_match("/margin-left:\s?(\d+)/", $matches[2][$i], $value);
        $marginLeft = isset($value[1]) ? $value[1] : 0;

        // Check if this margin is greater than the last
        if ($marginLeft > $lastMarginLeft) {
            // Check content
            if (strpos($matches[3][$i], '*') === 0) {
                $results[] = $matches[0][$i];
            }
        }
    }

    // Capture margin left for next run
    $lastMarginLeft = $marginLeft;
}

// Results:
// Array
// (
//     [0] => <p style="color:blue; margin-left: 10px">* Item 1</p>
//     [1] => <p style="margin-left: 20px">* Sub Item 1a</p>
//     [2] => <p style="margin-left: 20px">* Sub Item 1b</p>
//     [3] => <p style="margin-left: 30px">* Sub Item 1c</p>
//     [4] => <p style="color:blue; margin-left: 10px">* Item 1</p>
// )