php正则表达式从HTML表中提取数据_Php_Html_Regex_Html Parsing

php正则表达式从HTML表中提取数据

php html regex

php正则表达式从HTML表中提取数据,php,html,regex,html-parsing,Php,Html,Regex,Html Parsing,我正试图制作一个正则表达式，用于从表中提取一些数据我现在得到的代码是： <table> <tr> <td>quote1</td> <td>have you trying it off and on again ?</td> </tr> <tr> <td>quote65</td> <td>You wouldn

我正试图制作一个正则表达式，用于从表中提取一些数据

我现在得到的代码是：

<table>
   <tr>
     <td>quote1</td>
     <td>have you trying it off and on again ?</td>
   </tr>
   <tr>
     <td>quote65</td>
     <td>You wouldn't steal a helmet of a policeman</td>
   </tr>
</table>

我想用以下内容替换此内容：

你试过了吗

引用65：你不会偷警察的头盔

我已经编写的代码如下：

%<td>((?s).*?)</td>%

但是现在我被卡住了。

像往常一样，从HTML和其他非常规语言中提取文本应该用解析器完成——正则表达式可能会在这里产生问题。但如果您确定数据的结构，您可以使用

%<td>((?s).*?)</td>\s*<td>((?s).*?)</td>%

找到两段文字\1:\2将成为替换项

如果文本不能跨越多行，则删除？s位会更安全…

不要使用正则表达式，请使用HTML解析器。例如

< P>提姆的正则表达式可能是有效的，但是您可能需要考虑使用PHP的DOM功能而不是正则表达式，因为它在处理标记的微小变化方面可能更可靠。

请参见

如果您真的想使用正则表达式，如果您真的确定字符串的格式将始终是这样，那么可以这样做，在您的情况下：

$str = <<<A
<table>
   <tr>
     <td>quote1</td>
     <td>have you trying it off and on again ?</td>
   </tr>
   <tr>
     <td>quote65</td>
     <td>You wouldn't steal a helmet of a policeman</td>
   </tr>
</table>
A;

$matches = array();
preg_match_all('#<tr>\s+?<td>(.*?)</td>\s+?<td>(.*?)</td>\s+?</tr>#', $str, $matches);

var_dump($matches);

然后，您只需要操纵这个数组，使用一些字符串串联或类似的方法；比如像这样,

$num = count($matches[1]);
for ($i=0 ; $i<$num ; $i++) {
    echo $matches[1][$i] . ':' . $matches[2][$i] . '<br />';
}

注意：您应该添加一些安全检查，如preg_match_all必须返回true，count必须至少为1

附带说明：使用正则表达式解析HTML通常不是一个好主意；如果您可以使用真正的解析器，它应该更安全…

从中提取每个内容

可能重复的

$num = count($matches[1]);
for ($i=0 ; $i<$num ; $i++) {
    echo $matches[1][$i] . ':' . $matches[2][$i] . '<br />';
}

quote1:have you trying it off and on again ?
quote65:You wouldn't steal a helmet of a policeman

    preg_match_all("%\<td((?s).*?)</td>%", $respose, $mathes);
    var_dump($mathes);