Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Xml 解码及#55357; 真实性格_Xml_Unicode - Fatal编程技术网

Xml 解码及#55357; 真实性格

Xml 解码及#55357; 真实性格,xml,unicode,Xml,Unicode,当我从twitter的streamapi读取数据,然后写入xmlfile时 但是一些特殊的字符,比如�将导致错误(我的意思是当我在Chrome中打开该xmlfile时,Chrome说该字符有错误!) 在写入xmlfile之前,我想将编码序列(�;)转换为实字符(�;) 如何实现这一点 -------------增加-------------- 这是XMLFile内容: <?xml version="1.0" encoding="UTF-8"?> &

当我从twitter的streamapi读取数据,然后写入xmlfile时

但是一些特殊的字符,比如
将导致错误(我的意思是当我在Chrome中打开该xmlfile时,Chrome说该字符有错误!)

在写入xmlfile之前,我想将编码序列(
�;
)转换为实字符(�;)

如何实现这一点

-------------增加--------------

这是XMLFile内容:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<text>@carlyraejepsen would be a dream if you follow me, please follow me?, I love you so much you're my inspiration</text>
<text>someone please bring me a caramel apple and a mocha from black cat. i'll love you forever</text>
<text>“@G_MartinFlyKick: Marry me Juliet.I love you and that's all I really know.”&#55357;&#56834;&#55357;&#56834;&#55357;&#56834;&#55357;&#56834;&#55357;&#56834;</text>
<text>"I need to see a picture of him cuz Im trying to imagine you guys making love and all I see is u climbing on top of a big question mark"lmao</text>
<text>@District3music hi, I LOVE YOU follow me please? &amp;lt;3 xx 23</text>
<text>RT @syardley_: So appreciative of my family and people I love, wouldn't be where I am without them. #thankful</text>
<text>#DISTRICT3HALLOWEENFOLLOWSPREE #DISTRICT3HALLOWEENFOLLOWSPREE #3EEKERFROMTHENETHERLANDS love you! Please follow ? @District3music x42</text>
<text>Arguably my favorite electronic music producer @Kluteuk is coming back to Toronto on Dec 22nd. So stoked. Guy has made so many tunes I LOVE.</text>
<text>The stakes are high, the water's rough, but this love is ours.</text>
<text>@NiallOfficial Answer me, I love you very much. Venezuela loves. jhgj</text>
<text>Love this shit http://t.co/qSP79NKx</text>
</root>

字符参考
和#55357
表示代理代码点(U+D83D),因此尝试将其转换为字符是错误的。它不是一个字符,甚至不是半个字符


您需要追溯到生成引用的点。原因可能是字符编码混乱。在UTF-16中,代理代码单元可能会出现,但当数据被解释为字符时,必须成对处理,例如转换为另一种编码或转换为字符引用。

您可以在服务器响应后使用正则表达式替换它。 python中的简单示例:

import re 
pattern = re.compile(r'&#')
new_content = pattern.sub(' ', SERVER_RESPONSE)

您使用的语言/框架是什么。网?C6502汇编程序?错误说明了什么?请向我们展示XML。我添加了更多详细信息@SLaksI从该链接检索数据:,那么,编码混乱是如何发生的?@Songokute,很难说,因为页面提示输入用户名和密码。从XMLFile内容判断,数据似乎包含U+1F602这样的字符“很抱歉回复晚了!但您可以使用您的twitter帐户登录!(因为这是twitter stream API)。还有,您的意思是我应该将XMLFile的编码从UTF-8更改为UTF-16吗?是的,我做到了,谢谢@Korpela,我在写下之前将xmlObject的编码设置为UTF-16:)
import re 
pattern = re.compile(r'&#')
new_content = pattern.sub(' ', SERVER_RESPONSE)