PHP-HTML解析：：如何使用简单的HTMLDOM解析器获取网页的字符集值？_Php_Parsing_Html Parsing_Simple Html Dom_Php Parser

PHP-HTML解析：：如何使用简单的HTMLDOM解析器获取网页的字符集值？

php parsing

PHP-HTML解析：：如何使用简单的HTMLDOM解析器获取网页的字符集值？,php,parsing,html-parsing,simple-html-dom,php-parser,Php,Parsing,Html Parsing,Simple Html Dom,Php Parser,PHP：：如何使用simple（utf-8、windows-255等）获取网页的字符集值备注：它必须使用html dom解析器完成示例1网页字符集输入： <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> <meta content="text/html; charset=windows-255" http-equiv="Content-Type"> $html = file_ge

PHP：：如何使用simple（utf-8、windows-255等）获取网页的字符集值

备注：它必须使用html dom解析器完成

示例1网页字符集输入：

<meta content="text/html; charset=utf-8" http-equiv="Content-Type">

<meta content="text/html; charset=windows-255" http-equiv="Content-Type">

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
echo $el->charset;

应该改变什么？（我知道$el->字符集不起作用）

谢谢

您必须使用正则表达式匹配字符串（我希望您有PCRE…）

不是很健壮，但应该可以工作

$dd = new DOMDocument;
$dd->loadHTML($data);
foreach ($dd->getElementsByTagName("meta") as $m) {
    if (strtolower($m->getAttribute("http-equiv")) == "content-type") {
        $v = $m->getAttribute("content");
        if (preg_match("#.+?/.+?;\\s?charset\\s?=\\s?(.+)#i", $v, $m))
            echo $m[1];
    }
}

请注意，DOM扩展隐式地将所有数据转换为UTF-8。

感谢MvanGeest的回答-我只做了一点修改，它的工作非常完美。

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo substr($matches[0], strlen("charset="));

对

//meta[@http equiv=“Content Type”]/@Content

运行xpath查询。您必须自己分析属性值。@Frank SimpleHTMLDom无法执行XPath建议的第三方替代方案，这些替代方案实际上使用DOM而不是字符串分析：，并且。现在这比我写的更健壮了一点…：）感谢您提供此选项，因为拥有utf-8数据非常重要。@Mva是的，内容类型有时写为“内容类型”。至少在http头中，大小写无关紧要。DomDocument无法始终将正确的文本转换为utf-8。我还在努力处理这个问题。谢谢！我修正了一点，它的工作见我的答案修正$html=文件获取html（'）$el=$html->find（'meta[content]'，0）$fullvalue=$el->content；preg_match（'/charset=（.+）/'，$fullvalue，$matches）；echo substr（$matches[0]，strlen（“charset=”））；别这样，我犯了个错误。它应该是

$matches[1]

。这使它更快更可靠。奇怪。。。这对我有用。不过，您不需要

substr

。。。只需

$matches[1]

。我用谷歌做了测试。

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo substr($matches[0], strlen("charset="));