PHP-将HTML属性字符串拆分为索引数组_Php_Split_Html

PHP-将HTML属性字符串拆分为索引数组

php html

PHP-将HTML属性字符串拆分为索引数组,php,split,html,Php,Split,Html,我有一个带有HTML属性的字符串： $attribs = ' id= "header " class = "foo bar" style ="background-color:#fff; color: red; "'; 如何将该字符串转换为索引数组，如： array( 'id' => 'header', 'class' => array('foo', 'bar'), 'style' => array( 'background-color' =>

我有一个带有HTML属性的字符串：

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

如何将该字符串转换为索引数组，如：

array(
  'id' => 'header',
  'class' => array('foo', 'bar'),
  'style' => array(
    'background-color' => '#fff',
    'color' => 'red'
  )
)

因此，我可以使用PHP array\u merge\u递归函数合并两组HTML属性

谢谢

您可以使用正则表达式提取该信息：

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';
$pattern = '/(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/';
preg_match_all($pattern, $attribs, $matches, PREG_SET_ORDER);
$attrs = array();
foreach ($matches as $match) {
    if (($match[2][0] == '"' || $match[2][0] == "'") && $match[2][0] == $match[2][strlen($match[2])-1]) {
        $match[2] = substr($match[2], 1, -1);
    }
    $name = strtolower($match[1]);
    $value = html_entity_decode($match[2]);
    switch ($name) {
    case 'class':
        $attrs[$name] = preg_split('/\s+/', trim($value));
        break;
    case 'style':
        // parse CSS property declarations
        break;
    default:
        $attrs[$name] = $value;
    }
}
var_dump($attrs);

现在，您只需要解析

类的类（在空白处拆分）和样式的属性声明（稍微难一点，因为它可以包含带有；
的注释和URL）。这可能对您有所帮助。。
它的作用

用PHP5+编写的HTMLDOM解析器可以让您以非常简单的方式操作HTML
需要PHP5+
支持无效的HTML
使用选择器在HTML页面上查找标记，就像jQuery一样
在一行中从HTML中提取内容

不能使用正则表达式来解析html属性。这是因为语法是上下文的。您可以使用正则表达式来标记输入，但需要一个状态机来解析它
如果性能不是什么大问题，最安全的方法可能是将属性包装在标记中，然后通过html解析器发送。例如：
function parse_attributes($input) {
  $dom = new DomDocument();
  $dom->loadHtml("<foo " . $input. "/>");
  $attributes = array();
  foreach ($dom->documentElement->attributes as $name => $attr) {
    $attributes[$name] = $node->value;
  }
  return $attributes;
}

函数解析_属性（$input）{
$dom=新的DomDocument（）；
$dom->loadHtml（“”）；
$attributes=array（）；
foreach（$dom->documentElement->attributes as$name=>$attr）{
$attributes[$name]=$node->value；
}
返回$attributes；
}

您可能可以通过重用解析器或使用或来优化上述内容。
使用SimpleXML:
<?php
$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

$x = new SimpleXMLElement("<element $attribs />");

print_r($x);

?>



这假设属性总是名称/值对…
简单的方法也可以是：
$atts_array = current((array) new SimpleXMLElement("<element $attribs />"));
$atts_数组=当前（（数组）新的SimpleXMLElement（“”）；
一个简单有效的函数来解决这个问题
function attrString2Array($attr) {
  $atList = [];

  if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
    for ($i = 0; $i < count($m[0]); $i++) {
      if ($m[3][$i])
        $atList[$m[3][$i]] = null;
      else
        $atList[$m[1][$i]] = $m[2][$i];
    }
  }

  return $atList;
}

print_r(attrString2Array('<li data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif">'));
print_r(attrString2Array('data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif"'));

//Array
//(
//    [data-tpl-classname] => class
//    [data-tpl-title] => innerHTML
//    [disabled] => 
//    [nowrap] => 
//    [href] => #
//    [hide] => 
//    [src] => images/asas.gif
//)

函数attrString2Array（$attr）{
$atList=[]；
如果（preg_match_all（'/\s*（？：（[a-z0-9-]+）\s*=\s*”（[^“]*））））（？：\s+（[a-z0-9-]+）（？=\s*.>\s+[a..z0-9]+）/i'，$attr，$m））{
对于（$i=0；$i）；
打印（attrString2Array（'data-tpl-classname=“class”data-tpl title=“innerHTML”已禁用nowrap href=“#”hide src=“images/asas.gif”）；
//排列
//(
//[数据tpl类名称]=>类
//[数据tpl标题]=>innerHTML
//[禁用]=>
//[nowrap]=>
//[href]=>#
//[隐藏]=>
//[src]=>images/asas.gif
//)
谢谢你，Gumbo，你的正则表达式很酷。唯一的问题是$attrs['class']或$attrs['style']返回字符串：因此很难将它们与另一个$attribs字符串合并，例如，将两组attribs:$attribs1='class=“foo bar”；$attribs2='class=“lorem”；$attribs2='class=“foo bar lorem”合并成一个'class=“foo bar”“'这就是为什么我希望$attrs['class']返回一个数组：array（'foo'，'bar'）你有什么想法来增强它吗？我刚刚编写了一个替代的正则表达式，它也解析HTML5样式的布尔属性（不带=符号），并对引号使用反向引用：（\w+）\s*（=\s*（[“'））（*？\2\s）？
解析这个：foo='bar'cuux=“O'Reiley”“zip=“\”zap\”“@troelskn:第三个属性值声明无效。“
需要用字符引用来表示。你是对的，我没有意识到这一点。我仍然建议使用xml/html解析器来解释各种奇怪的边缘情况。请注意，我在这里结束的一个原因是因为DOMProcessingInstruction有一个数据
字段，它是
中的文本。在标记的情况下，例如：
会得到一个普通字符串，如：type=“text/xsl”href=”https://sms.m2osw.com/sitemap.xsl“
您需要将其解析为属性。欢迎使用StackOverflow！请编辑您的答案以提供代码的解释。这将提高你的答案的质量，并使其更有可能获得更高的投票率：）你注意到OP的问题是寻求一个多维度的结果吗？