如何在PHP中将UTF16代理项对转换为等效的十六进制代码点？_Php_Utf 16_Codepoint

如何在PHP中将UTF16代理项对转换为等效的十六进制代码点？

php

如何在PHP中将UTF16代理项对转换为等效的十六进制代码点？,php,utf-16,codepoint,Php,Utf 16,Codepoint,我正在制作一个应用程序，当聊天将从iOS应用程序发送，但管理员可以查看从管理面板，这是内置在PHP聊天从DB，我将收到如下聊天信息： Hi, Jax\ud83d\ude1b\ud83d\ude44! can we go for a coffee? 我正在使用twemoji，它可以将十六进制代码点转换为图像 window.onload = function() { // Set the size of the rendered Emojis // This can be set to

我正在制作一个应用程序，当聊天将从iOS应用程序发送，但管理员可以查看从管理面板，这是内置在PHP聊天

从DB，我将收到如下聊天信息：

Hi, Jax\ud83d\ude1b\ud83d\ude44! can we go for a coffee?

我正在使用twemoji，它可以将十六进制代码点转换为图像

window.onload = function() {

  // Set the size of the rendered Emojis
  // This can be set to 16x16, 36x36, or 72x72
  twemoji.size = '16x16';

  // Parse the document body and
  // insert <img> tags in place of Unicode Emojis
  twemoji.parse(document.body);
}

具体说来,

在php部分中，我有以下代码：-

$text = "This is fun \u1f602! \u1f1e8 ";
$html = preg_replace("/\\\\u([0-9A-F]{2,5})/i", "&#x$1;", $text);
echo $html;

现在，twemoji解析HTML文档的整个主体，以替换图像的十六进制代码点

window.onload = function() {

  // Set the size of the rendered Emojis
  // This can be set to 16x16, 36x36, or 72x72
  twemoji.size = '16x16';

  // Parse the document body and
  // insert <img> tags in place of Unicode Emojis
  twemoji.parse(document.body);
}

window.onload=function（）{
//设置渲染表情的大小
//这可以设置为16x16、36x36或72x72
twemoji.size='16x16'；
//解析文档体和
//插入标记以代替Unicode表情符号
parse（document.body）；
}

因此，我需要文本来替换所有UTF-16到十六进制的代码点（对于emojis）。

我该怎么做？

这里有一个双重问题：

检测到存在已编码的代理项对
实际上，将代理项对转换为HTML实体

解释问题的复杂性远远超出了单个答案的范围（您必须阅读UTF-16了解这一点），但此代码片段似乎解决了您的问题：

$text = "Hi, Jax\\ud83d\\ude1b\\ud83d\\ude44! can we go for a coffee?";

$result = preg_replace_callback('/\\\\u(d[89ab][0-9a-f]{2})\\\\u(d[c-f][0-9a-f]{2})/i', function ($matches) {
    $first = $matches[1];
    $second = $matches[2];
    $value = ((eval("return 0x$first;") & 0x3ff) << 10) | (eval("return 0x$second;") & 0x3ff);
    $value += 0x10000;
    return "&#$value;";
  }, $text);

echo $result;

$text=“嗨，Jax\\ud83d\\ude1b\\ud83d\\ude44！我们能去喝杯咖啡吗？”；
$result=preg\u replace\u回调（'/\\\\u（d[89ab][0-9a-f]{2}）\\\u（d[c-f][0-9a-f]{2}）/i'，函数（$matches）{
$first=$matches[1]；
$second=$matches[2]；
$value=（（eval（“返回0x$first；”）和0x3ff）这里有一个双重问题：

检测到存在已编码的代理项对
实际上，将代理项对转换为HTML实体

解释问题的复杂性远远超出了单个答案的范围（您必须阅读UTF-16了解这一点），但此代码片段似乎解决了您的问题：
$text = "Hi, Jax\\ud83d\\ude1b\\ud83d\\ude44! can we go for a coffee?";

$result = preg_replace_callback('/\\\\u(d[89ab][0-9a-f]{2})\\\\u(d[c-f][0-9a-f]{2})/i', function ($matches) {
    $first = $matches[1];
    $second = $matches[2];
    $value = ((eval("return 0x$first;") & 0x3ff) << 10) | (eval("return 0x$second;") & 0x3ff);
    $value += 0x10000;
    return "&#$value;";
  }, $text);

echo $result;

$text=“嗨，Jax\\ud83d\\ude1b\\ud83d\\ude44！我们能去喝杯咖啡吗？”；
$result=preg\u replace\u回调（'/\\\\u（d[89ab][0-9a-f]{2}）\\\u（d[c-f][0-9a-f]{2}）/i'，函数（$matches）{
$first=$matches[1]；
$second=$matches[2]；
$value=（（eval（“return 0x$first；”）和0x3ff）这的确是一个魔咒…但是有一些对没有被转换。什么对？你确定它们是实际的对而不是BMP码点（它们将被简单地编码为\unnn
，用一个转义序列而不是两个转义序列）？如果是这种情况，则需要使用另一个正则表达式替换来过滤掉它们。这非常简单，因为您只需将\unnn
替换为&#xnnn；
（其中NNNN
正好是四个十六进制数字），只需简单的正则表达式替换即可。像\ud83d\ude43
人，这的确是一个魅力…但是有些对没有被转换。什么对？你确定它们是实际的对而不是BMP码点（它将被简单地编码为\unnn
，用一个转义序列而不是两个转义序列）？如果是这种情况，您需要使用另一个正则表达式替换来过滤掉它们。这非常简单，因为您只需将\unnn
替换为&#xnnn；
（其中nnnnnn
正好是四个十六进制数字），并且可以使用简单的正则表达式替换来完成。如\ud83d\ude43