php dom utf-8编码问题
我在执行QueryPath后遇到编码问题php dom utf-8编码问题,php,zend-framework,domdocument,domxpath,Php,Zend Framework,Domdocument,Domxpath,我在执行QueryPath后遇到编码问题 <?php header ( 'Content-Type: text/html; charset=utf-8' ); mb_internal_encoding ( 'utf-8' ); mb_http_output ( 'utf-8' ); mb_http_input ( 'utf-8' ); mb_regex_encoding ( 'utf-8' ); ini_set ( 'include_path', 'ZendFramework-2.4.9
<?php
header ( 'Content-Type: text/html; charset=utf-8' );
mb_internal_encoding ( 'utf-8' );
mb_http_output ( 'utf-8' );
mb_http_input ( 'utf-8' );
mb_regex_encoding ( 'utf-8' );
ini_set ( 'include_path', 'ZendFramework-2.4.9\library' );
require_once 'Zend/Loader/StandardAutoloader.php';
$autoloader = new Zend\Loader\StandardAutoloader ( array (
'fallback_autoloader' => true
) );
$autoloader->register ();
use Zend\Dom\Query;
use Zend\Debug\Debug;
$url = "http://expert.com.pt/115-5-programas/14865-02-809-002-00263-meireles-maq-lavar-loica-mll-125-w-5604409141651.html";
$ch = curl_init ( $url );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ( $ch, CURLOPT_HEADER, 0 );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, true );
$content = curl_exec ( $ch );
curl_close ( $ch );
$pdom = new Query ( mb_convert_encoding ( $content, 'HTML-ENTITIES', "UTF-8" ) );
// $pdom->setEncoding('UTF-8');
// echo $pdom->getEncoding();
$result = $pdom->queryXpath ( '//*[@itemtype="http://schema.org/Product"]' );
if ($result->count ()) {
foreach ( $result as $r ) {
// echo "----------------------------------------";
if ($r->hasChildnodes ()) {
$lbHtml = $r->C14N ();
$dom2 = new Query ( $lbHtml );
$nome_produto = $dom2->queryXpath ( '//*[@itemprop="name"]' );
$ref_expert = $dom2->queryXpath ( '//*[@itemprop="sku"]' );
$preco = $dom2->queryXpath ( '//*[@itemprop="price"]' );
// *[@itemprop="image"] // small pic
$imagem = $dom2->queryXpath ( '//*[@id="bigpic"]' );
$peq_desc = $dom2->queryXpath ( '//*[@itemprop="description"]' );
// *[contains(@class,"product-desc")]
$url_prod = $dom2->queryXpath ( '//*[contains(@class,"pb-center-column col-xs-12 col-sm-4")]/p[4]/a' );
$categoria = $pdom->queryXpath ( '//*[contains(@class,"breadcrumb clearfix")]/a[4]' ); // categoria
if ($nome_produto->count ()) {
foreach ( $nome_produto as $name ) {
$_arr ['name'] = $name->nodeValue;
}
}
if ($ref_expert->count ()) {
foreach ( $ref_expert as $ref ) {
$_arr ['ref'] = $ref->nodeValue;
}
}
if ($preco->count ()) {
foreach ( $preco as $_preco ) {
preg_match ( "/((?:[0-9]+,)*[0-9]+(?:\.[0-9]+)?)/", $_preco->nodeValue, $_preco );
$_arr ['price'] = ( float ) str_replace ( ",", ".", $_preco [0] );
}
}
if ($imagem->count ()) {
foreach ( $imagem as $_image ) {
$_arr ['image'] = $_image->getAttribute ( 'src' );
}
}
if ($peq_desc->count ()) {
foreach ( $peq_desc as $_peqdesc ) {
$_arr ['description_small'] = $_peqdesc->C14N ();
}
}
if ($url_prod->count ()) {
foreach ( $url_prod as $_url_prod ) {
$_arr ['url_prod'] = $_url_prod->getAttribute ( 'href' );
}
}
if ($categoria->count ()) {
foreach ( $categoria as $_categoria ) {
$_arr ['categoria'] = $_categoria->nodeValue;
}
}
// die();
}
}
}
echo "<pre>";
print_r ( $_arr );
找到了问题
对于我拥有的每个Zend_DOM_查询($html),我必须放置标签
找到了问题
对于我拥有的每个Zend_DOM_查询($html),我必须放置标签
让我明确一点,您的预期输出是什么?[name]=>MEIRELES-Máq。Lavar Loiça MLL 125 W正确的编码问题是我过滤的html输出没有使用meta标记只是为了让我清楚,你的预期输出是什么?[name]=>meirels-Máq。LavarLoiça MLL 125 W正确的编码问题是我过滤的html输出没有假设meta标记子查询以及子查询
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />