Php 使用正则表达式查找匹配组大于/小于的项目
鉴于以下案文:Php 使用正则表达式查找匹配组大于/小于的项目,php,regex,pcre,regex-negation,regex-lookarounds,Php,Regex,Pcre,Regex Negation,Regex Lookarounds,鉴于以下案文: <p style="color: blue">Some text</p> <p style="color:blue; margin-left: 10px">* Item 1</p> // Should match <p style="margin-left: 10px">* Item 2</p> <p style="margin-left: 20px">* Sub Item 1a</p>
<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p> // Should match
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p> // Should match
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p> // Should match
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p> // Should match
<p>Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p> // Should match
我不能使用DomDocument
,因为我收到的标记并不总是有效的标记(通常来自Microsoft Office>HTML转换),所以我使用正则表达式来解决这个问题
我现在的正则表达式是:
(?!<p style=".*?(margin-left:\s?(?!\k'margin')px;).*?">\* .*?<\/p>)<p style="(?P<styles>.*?)margin-left:\s?(?P<margin>[0-9]{1,3})px;?">\* (?P<listcontent>.*)<\/p>
(?!\*(?p.*)
但这仅基于前面元素的存在进行匹配,这些元素是一个p
,左边距为
如何将匹配的左边距
组中的系数计算在内,并返回大于上一个匹配的值
我创建了一个示例来演示这个问题,其中包含示例数据和当前输出。此代码按照预期工作,使用正则表达式获取每个元素,然后循环遍历它们并检查业务逻辑:
<?php
$data = '<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p>
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p>
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p>
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p>
<div>Some text</div>
<p style="color:blue; margin-left: 10px">* Item 1</p>';
// Get all HTML tags, the element in [1], the attributes (style etc) in [2], the content in [3]
preg_match_all("/<(\w+)\b([^>]+)*>(.*?)<\/\w+>/", $data, $matches);
$results = [];
// Keep track of last element margin-left, if it's is missing it will be set to 0 making the next
// element included automatically if it has a margin-left
$lastMarginLeft = 0;
// Loop through matches and apply business rules
for ($i = 0; $i <= count($matches[0]); $i++) {
/**
* Business rules:
* - Contents begins with an asterisk character
* - Elements have a margin-left inline style
* - The preceding content is either:
* - A p element which has no margin-left
* - A p element with a margin-left which is lower than the matched element
* - Any other element
*/
// Assume no margin-left found by default
$marginLeft = 0;
// Check element has a margin-left
if (strpos($matches[2][$i], 'margin-left') !== false) {
// Extract margin-left value
preg_match("/margin-left:\s?(\d+)/", $matches[2][$i], $value);
$marginLeft = isset($value[1]) ? $value[1] : 0;
// Check if this margin is greater than the last
if ($marginLeft > $lastMarginLeft) {
// Check content
if (strpos($matches[3][$i], '*') === 0) {
$results[] = $matches[0][$i];
}
}
}
// Capture margin left for next run
$lastMarginLeft = $marginLeft;
}
// Results:
// Array
// (
// [0] => <p style="color:blue; margin-left: 10px">* Item 1</p>
// [1] => <p style="margin-left: 20px">* Sub Item 1a</p>
// [2] => <p style="margin-left: 20px">* Sub Item 1b</p>
// [3] => <p style="margin-left: 30px">* Sub Item 1c</p>
// [4] => <p style="color:blue; margin-left: 10px">* Item 1</p>
// )
这是否必须在一个过程/方法中完成?你能匹配所有的标签然后使用PHP来减少集合吗?因为这是一系列操作的一部分,如果可能的话,它需要作为一个正则表达式来完成。我不相信这在一次传递中是可能的,因为你需要将值与其他值进行比较。您将能够找到p
元素,其中包含左边距
,但您需要一个辅助过程来进行比较。如果无法将其作为一个正则表达式来进行比较,并且您有一种方法可以使用两个过程来解决它,请将其作为答案写出来。我将写一个答案,一个问题:最后一个元素如何匹配?它前面有一个p元素,该元素没有剩余的余量,因此它没有通过测试“p元素以外的任何元素”?
<?php
$data = '<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p>
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p>
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p>
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p>
<div>Some text</div>
<p style="color:blue; margin-left: 10px">* Item 1</p>';
// Get all HTML tags, the element in [1], the attributes (style etc) in [2], the content in [3]
preg_match_all("/<(\w+)\b([^>]+)*>(.*?)<\/\w+>/", $data, $matches);
$results = [];
// Keep track of last element margin-left, if it's is missing it will be set to 0 making the next
// element included automatically if it has a margin-left
$lastMarginLeft = 0;
// Loop through matches and apply business rules
for ($i = 0; $i <= count($matches[0]); $i++) {
/**
* Business rules:
* - Contents begins with an asterisk character
* - Elements have a margin-left inline style
* - The preceding content is either:
* - A p element which has no margin-left
* - A p element with a margin-left which is lower than the matched element
* - Any other element
*/
// Assume no margin-left found by default
$marginLeft = 0;
// Check element has a margin-left
if (strpos($matches[2][$i], 'margin-left') !== false) {
// Extract margin-left value
preg_match("/margin-left:\s?(\d+)/", $matches[2][$i], $value);
$marginLeft = isset($value[1]) ? $value[1] : 0;
// Check if this margin is greater than the last
if ($marginLeft > $lastMarginLeft) {
// Check content
if (strpos($matches[3][$i], '*') === 0) {
$results[] = $matches[0][$i];
}
}
}
// Capture margin left for next run
$lastMarginLeft = $marginLeft;
}
// Results:
// Array
// (
// [0] => <p style="color:blue; margin-left: 10px">* Item 1</p>
// [1] => <p style="margin-left: 20px">* Sub Item 1a</p>
// [2] => <p style="margin-left: 20px">* Sub Item 1b</p>
// [3] => <p style="margin-left: 30px">* Sub Item 1c</p>
// [4] => <p style="color:blue; margin-left: 10px">* Item 1</p>
// )