从unicode数据显示正确的PHP JSON输出
我和我的朋友正在开发一个python scraper,它使用从unicode数据显示正确的PHP JSON输出,php,json,unicode,encoding,beautifulsoup,Php,Json,Unicode,Encoding,Beautifulsoup,我和我的朋友正在开发一个python scraper,它使用beautifulsoup4解析网站。我们过滤页面的部分内容,并从python脚本“打印”这个输出 实际上,它是由PHP执行的。然而,我们很难找出经典的编码问题。默认情况下,Beautifulsoup返回unicode数据。这就是我们所期待的PHP脚本 我们现在要做的是解析输出并将其编码为有效的JSON。在这个过程中,我们不希望在输出中有unicode代表,而是希望它们的utf-8等价物 php脚本的部分输出如下所示: ["{"," \
beautifulsoup4
解析网站。我们过滤页面的部分内容,并从python脚本“打印”这个输出
实际上,它是由PHP
执行的。然而,我们很难找出经典的编码问题。默认情况下,Beautifulsoup
返回unicode数据。这就是我们所期待的PHP
脚本
我们现在要做的是解析输出并将其编码为有效的JSON
。在这个过程中,我们不希望在输出中有unicode代表,而是希望它们的utf-8
等价物
php脚本的部分输出如下所示:
["{"," \"course_count_grade\": 24,"," \"course_count_pass\": 3,"," \"course_count_pending\": 5,"," \"course_count_total\": 32,"," \"course_credits_grade\": 0.0,"," \"course_credits_pass\": 0.0,"," \"course_list_grade\": ["," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2571\","," \"course_name_sv\": \"Framtidens medier\","," \"credits\": \"\","," \"credits_registered\": \"10.0\","," \"date\": \"2013-12-27\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"1.5\","," \"date\": \"2013-12-20\","," \"detail_id\": \"\\u00a0LABA\","," \"detail_name_sv\": \"Laborationer\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2572\","," \"course_name_sv\": \"Teori och metod f\\u00f6r Medieteknik\","," \"credits\": \"\","," \"credits_registered\": \"7.5\","," \"date\": \"2013-12-20\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"7.0\","," \"date\": \"2013-12-27\","," \"detail_id\": \"\\u00a0PRO1\","," \"detail_name_sv\": \"Projekt\","," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"3.0\","," \"date\": \"2013-12-27\","," \"detail_id\": \"\\u00a0LIT1\","," \"detail_name_sv\": \"Litteraturuppgift\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"B\""," },"," {"," \"comment\": \"\\u00a0\",
我为PHP
json\u encode()
函数尝试了不同的选项,比如json\u UNESCAPED\u UNICODE
,但没有成功
有关于我可能做错什么的提示吗
更新:
@Len_D,是的,我执行的python脚本如下:
exec($command,$output)代码>
然后我把它拿出来还回去。当我尝试按照你的建议做时:utf8\u解码($output)代码>我得到一个错误,说“utf8\u decode()期望参数1是字符串,数组给定”。然后我尝试了这个:utf8_解码(json_编码($output))代码>这会给我一个输出,但它与以前一样:
["{"," \"course_count_grade\": 24,"," \"course_count_pass\": 3,"," \"course_count_pending\": 5,"," \"course_count_total\": 32,"," \"course_credits_grade\": 0.0,"," \"course_credits_pass\": 0.0,"," \"course_list_grade\": ["," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2571\","," \"course_name_sv\": \"Framtidens medier\","," \"credits\": \"\","," \"credits_registered\": \"10.0\","," \"date\": \"2013-12-27\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"1.5\","," \"date\": \"2013-12-20\","," \"detail_id\": \"\\u00a0LABA\","," \"detail_name_sv\": \"Laborationer\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2572\","," \"course_name_sv\": \"Teori och metod f\\u00f6r Medieteknik\",
您可以将这些头header('Content-Type:application/json')代码>阅读php函数utf8\u解码:
utf8_encode:试过了,没什么区别。试着更具体一点,以及在这个上下文中如何使用它。这并没有真正的帮助。我假设您获取输出并将其存储在一个php变量中,如$output='…'。如果是,则使用$output=utf8_decode(…),UTF符号将替换为ISO-8859-1字符。我是否正确理解了您的问题?由于返回了一个数组,您似乎需要遍历数组的每个元素,并对元素应用utf8_解码,然后更新数组。