PHP-从UTF-8转换为Unicode十六进制
有没有一种简单的方法可以将UTF-8中的字符串转换为unicode?PHP-从UTF-8转换为Unicode十六进制,php,unicode,encoding,character-encoding,string-conversion,Php,Unicode,Encoding,Character Encoding,String Conversion,有没有一种简单的方法可以将UTF-8中的字符串转换为unicode? 我基本上想做的是将'è'转换为'00E8'。您可以使用json_encode来实现这一点 $str = "è"; $str = json_encode($str); print $str; 这将打印\u00e8。如果需要,您可以str\u replace删除\u。如果您想要E而不是E,您可以使用strtoupper。您可以使用json\u encode来做到这一点 $str = "è"; $str = json_encode
我基本上想做的是将'è'转换为'00E8'。您可以使用
json_encode
来实现这一点
$str = "è";
$str = json_encode($str);
print $str;
这将打印\u00e8。如果需要,您可以
str\u replace
删除\u。如果您想要E而不是E,您可以使用strtoupper
。您可以使用json\u encode
来做到这一点
$str = "è";
$str = json_encode($str);
print $str;
这将打印\u00e8。如果需要,您可以
str\u replace
删除\u。如果你想要一个E而不是E,你可以使用strtoupper
/**
* Display utf && non-printable characters as hex
*
* @param string $str string containing binary
* @param boolean $htmlout add html markup?
*
* @return string
*/
public function strInspect($str)
{
$this->htmlout = $htmlout;
$regex = <<<EOD
/
( [\x01-\x7F] ) # single-byte sequences 0xxxxxxx (ascii 0 - 127)
| (
(?: [\xC0-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx
| [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences 1110xxxx 10xxxxxx * 2
| [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
){1,100} # ...one or more times
)
| ( [\x80-\xBF] ) # invalid byte in range 10000000 - 10111111 128 - 191
| ( [\xC0-\xFF] ) # invalid byte in range 11000000 - 11111111 192 - 255
| (.) # null (including x00 in the regex = fail)
/x
EOD;
$str_orig = $str;
$strlen = strlen($str);
$str = preg_replace_callback($regex, 'strInspectCallback', $str);
return $str;
}
/**
* Callback used by strInspect's preg_replace_callback
*
* @param array $matches matches
*
* @return string
*/
protected function strInspectCallback($matches)
{
$showHex = false;
if ($matches[1] !== '') {
// single byte sequence (may contain control char)
$str = $matches[1];
if (ord($str) < 32 || ord($str) == 127) {
$showHex = true;
if (in_array($str, array("\t","\n","\r"))) {
$showHex = false;
}
}
} elseif ($matches[2] !== '') {
// Valid byte sequence. return unmodified.
$str = $matches[2];
$sequences = array(
"\xef\xbb\xbf", // BOM
"\xc2\xa0", // no-break space
// "\xE2\x80\x89", // thin space
// "\xE2\x80\xAF", // narrow no-break space
"\xEF\xBF\xBD", // "Replacement Character"
);
foreach ($sequences as $seq) {
if ($str === $seq) {
$showHex = true;
break;
}
}
} elseif ($matches[3] !== '' || $matches[4] !== '') {
// Invalid byte
$str = $matches[3] != ''
? $matches[3]
: $matches[4];
$showHex = true;
} else {
// null char
$str = $matches[5];
$showHex = true;
}
if ($showHex) {
$chars = str_split($str);
foreach ($chars as $i => $c) {
$chars[$i] = '\x'.bin2hex($c);
}
$str = implode('', $chars);
}
return $str;
}
/**
*将utf和不可打印字符显示为十六进制(&N)
*
*@param string$str string包含二进制
*@param boolean$htmlout是否添加html标记?
*
*@返回字符串
*/
公共功能检查($str)
{
$this->htmlout=$htmlout;
$regex=这里是我从调试类修改的一个小东西
/**
* Display utf && non-printable characters as hex
*
* @param string $str string containing binary
* @param boolean $htmlout add html markup?
*
* @return string
*/
public function strInspect($str)
{
$this->htmlout = $htmlout;
$regex = <<<EOD
/
( [\x01-\x7F] ) # single-byte sequences 0xxxxxxx (ascii 0 - 127)
| (
(?: [\xC0-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx
| [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences 1110xxxx 10xxxxxx * 2
| [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
){1,100} # ...one or more times
)
| ( [\x80-\xBF] ) # invalid byte in range 10000000 - 10111111 128 - 191
| ( [\xC0-\xFF] ) # invalid byte in range 11000000 - 11111111 192 - 255
| (.) # null (including x00 in the regex = fail)
/x
EOD;
$str_orig = $str;
$strlen = strlen($str);
$str = preg_replace_callback($regex, 'strInspectCallback', $str);
return $str;
}
/**
* Callback used by strInspect's preg_replace_callback
*
* @param array $matches matches
*
* @return string
*/
protected function strInspectCallback($matches)
{
$showHex = false;
if ($matches[1] !== '') {
// single byte sequence (may contain control char)
$str = $matches[1];
if (ord($str) < 32 || ord($str) == 127) {
$showHex = true;
if (in_array($str, array("\t","\n","\r"))) {
$showHex = false;
}
}
} elseif ($matches[2] !== '') {
// Valid byte sequence. return unmodified.
$str = $matches[2];
$sequences = array(
"\xef\xbb\xbf", // BOM
"\xc2\xa0", // no-break space
// "\xE2\x80\x89", // thin space
// "\xE2\x80\xAF", // narrow no-break space
"\xEF\xBF\xBD", // "Replacement Character"
);
foreach ($sequences as $seq) {
if ($str === $seq) {
$showHex = true;
break;
}
}
} elseif ($matches[3] !== '' || $matches[4] !== '') {
// Invalid byte
$str = $matches[3] != ''
? $matches[3]
: $matches[4];
$showHex = true;
} else {
// null char
$str = $matches[5];
$showHex = true;
}
if ($showHex) {
$chars = str_split($str);
foreach ($chars as $i => $c) {
$chars[$i] = '\x'.bin2hex($c);
}
$str = implode('', $chars);
}
return $str;
}
/**
*将utf和不可打印字符显示为十六进制(&N)
*
*@param string$str string包含二进制
*@param boolean$htmlout是否添加html标记?
*
*@返回字符串
*/
公共功能检查($str)
{
$this->htmlout=$htmlout;
$regex=我试过了,但它对ASCII字符不起作用,我基本上是在寻找一些可以转换的东西,比如说:H到0048,è到00E8等等。我试过了,但对ASCII字符不起作用,我基本上是在寻找一些可以转换的东西,比如:H到0048,è到00E8等等。