如何在PHP中解码数字HTML实体_Php_Decode_Html Entities

如何在PHP中解码数字HTML实体

php

如何在PHP中解码数字HTML实体,php,decode,html-entities,Php,Decode,Html Entities,我试图将编码的长破折号从数字实体解码为字符串，但似乎找不到一个能正确完成这一任务的函数我发现最好的是mb_decode_numericentity（），但由于某些原因，它无法解码长破折号和其他一些特殊字符 $str = '–'; $str = mb_decode_numericentity($str, array(0xFF, 0x2FFFF, 0, 0xFFFF), 'ISO-8859-1'); 这将返回“？” 有人知道如何解决这个问题吗？mb\u decode\u n

我试图将编码的长破折号从数字实体解码为字符串，但似乎找不到一个能正确完成这一任务的函数

我发现最好的是mb_decode_numericentity（），但由于某些原因，它无法解码长破折号和其他一些特殊字符

$str = '&#8211;';

$str = mb_decode_numericentity($str, array(0xFF, 0x2FFFF, 0, 0xFFFF), 'ISO-8859-1');

这将返回“？”

有人知道如何解决这个问题吗？

mb\u decode\u numericentity

不处理十六进制，只处理十进制。您是否通过以下方式获得了预期的结果：

$str = '–';

$str = mb_decode_numericentity ( $str , Array(255, 3145727, 0, 65535) , 'ISO-8859-1');

您可以使用

hexdec

将十六进制转换为十进制

此外，出于好奇，还做了以下工作：

$str = '&#8211;';

 $str = html_entity_decode($str);

以下代码段（大部分是从中窃取并改进的）适用于文字、数字十进制和数字十六进制实体：

header("content-type: text/html; charset=utf-8");

/**
* Decodes all HTML entities, including numeric and hexadecimal ones.
* 
* @param mixed $string
* @return string decoded HTML
*/

function html_entity_decode_numeric($string, $quote_style = ENT_COMPAT, $charset = "utf-8")
{
$string = html_entity_decode($string, $quote_style, $charset);
$string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr_utf8("\\1")', $string);
return $string; 
}

/** 
 * Callback helper 
 */

function chr_utf8_callback($matches)
 { 
  return chr_utf8(hexdec($matches[1])); 
 }

/**
* Multi-byte chr(): Will turn a numeric argument into a UTF-8 string.
* 
* @param mixed $num
* @return string
*/

function chr_utf8($num)
{
if ($num < 128) return chr($num);
if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
return '';
}


$string ="&#x201D;"; 

echo html_entity_decode_numeric($string);

标题（“内容类型：text/html；字符集=utf-8”）；
/**
*解码所有HTML实体，包括数字和十六进制实体。
* 
*@param混合$string
*@return字符串已解码HTML
*/
函数html\u entity\u decode\u numeric（$string，$quote\u style=ENT\u COMPAT，$charset=“utf-8”）
{
$string=html\u entity\u decode（$string，$quote\u style，$charset）；
$string=preg_replace_回调（“~&#x（[0-9a-fA-F]+）；~i'，“chr_utf8_回调”，$string）；
$string=preg_replace（“~&#（[0-9]+）；~e”，“chr_utf8（\\1”），$string）；
返回$string；
}
/** 
*回调助手
*/
函数chr\u utf8\u回调（$matches）
{ 
返回chr_utf8（hexdec（$matches[1]）；
}
/**
*多字节chr（）：将数字参数转换为UTF-8字符串。
* 
*@param混合$num
*@返回字符串
*/
函数chr_utf8（$num）
{
if（$num<128）返回chr（$num）；
if（$num<2048）返回chr（$num>>6）+192.chr（$num&63）+128）；
if（$num<65536）返回chr（$num>>12）+224）.chr（（$num>>6）和63）+128.chr（$num&63）+128）；
if（$num<2097152）返回chr（（$num>>18）+240）.chr（（$num>>12）和63）+128.chr（（$num>>6）和63）+128.chr（$num和63）+128）；
返回“”；
}
$string=“”；”；
回显html\u实体\u解码\u数字（$string）；

欢迎提出改进建议。

ISO-8859-1中是否有长破折号？@Colshrapel:确实没有。它出现在Windows cp1252中，与之类似，但不是ISO-8859-1。更好：使用UTF-8。毫无疑问，ISO/IEC 8859-1（拉丁语-1）中没有长破折号。实际上，这是一个unicode字符，使用UTF-8会有所帮助。我忘了在浏览器中更改编码，这是我的错误。谢谢大家！感谢您的快速回复，但这也会返回“？”>$str=html\u entity\u decode（$str）；这是我尝试的第一件事。不。@Yuriy请在您就您的错误对这个问题发表评论后，反驳或确认您对这个答案的评论。我认为

html\u entity\u decode（）

是最简单正确的解决方案；不是有效的html实体引用，从XML文档中“溢出”的情况并不少见。添加以下内容以完全防水：

$string=str_ireplace（“&apos；”，“”，$string）另一个改进：此代码存在严重的内存泄漏。每次调用此函数时，使用create_function（）创建的新lambda函数都会卡在内存中。是的，关于preg_replace_callback（）的手册建议lambda函数是一个“好主意”，可以让代码看起来更干净。但这是错误的。创建一个简单的实函数函数chr_utf8_回调（$matches）{return chr_utf8（hexdec（$matches[1]）；}
并使用它来代替$string=preg_replace_回调（'~&#x（[0-9a-fA-F]+）~i'，chr_utf8_回调，$string）内存泄漏消失。