Python 从html中提取JSON对象<;脚本>;
我有一个JSON对象:Python 从html中提取JSON对象<;脚本>;,python,json,regex,web-scraping,beautifulsoup,Python,Json,Regex,Web Scraping,Beautifulsoup,我有一个JSON对象: { "review_body": "Beef noodles realism weathered modem tanto hotdog dolphin long-chain hydrocarbons 8-bit euro-pop tank-traps Tokyo narrative.-space j-pop franchise otaku faded RAF girl artisanal hotdog denim ablative systemic smart-Kow
{
"review_body": "Beef noodles realism weathered modem tanto hotdog dolphin long-chain hydrocarbons 8-bit euro-pop tank-traps Tokyo narrative.-space j-pop franchise otaku faded RAF girl artisanal hotdog denim ablative systemic smart-Kowloon. Man construct dome smart-computer pen monofilament beef noodles rain garage geodesic bicycle San Francisco wonton soup dissident nodal point tower. Boat uplink film dead man modem warehouse. Nodal point jeans euro-pop render-farm nano-fetishism semiotics hacker gang. Futurity narrative youtube otaku Kowloon free-market drugs. Fluidity assassin Tokyo bicycle media assault concrete industrial grade ablative lights boat BASE jump A.I. post-stimulate carbon. Physical computer narrative city youtube math-neural assassin modem.",
"link": "http://www.getlost.com/store/acme/review/10607787#comment10607787",
"seller_id": "104523",
"survey_id": "9933447",
"loggedin_user": 0,
"store_rating": "8.02",
"store_thumb": "http://www.getlost.com/store/thumbnail/acme.jpg",
"store_name": "acme",
"username": "ronin666",
"rating": "1",
"ref": "RR,acme,104523"
}
嵌入
<script LANGUAGE="javascript">
window.commentShare = $.extend((window.commentShare || {}), {
10607787: {
"review_body": "Beef noodles realism weathered modem tanto hotdog dolphin long-chain hydrocarbons 8-bit euro-pop tank-traps Tokyo narrative.-space j-pop franchise otaku faded RAF girl artisanal hotdog denim ablative systemic smart-Kowloon. Man construct dome smart-computer pen monofilament beef noodles rain garage geodesic bicycle San Francisco wonton soup dissident nodal point tower. Boat uplink film dead man modem warehouse. Nodal point jeans euro-pop render-farm nano-fetishism semiotics hacker gang. Futurity narrative youtube otaku Kowloon free-market drugs. Fluidity assassin Tokyo bicycle media assault concrete industrial grade ablative lights boat BASE jump A.I. post-stimulate carbon. Physical computer narrative city youtube math-neural assassin modem.",
"link": "http:\/\/www.getlost.com\/store\/acme\/review\/10607787#comment10607787",
"seller_id": "104523",
"survey_id": "9933447",
"loggedin_user": 0,
"store_rating": "8.02",
"store_thumb": "http:\/\/www.getlost.com\/store\/thumbnail\/acme.jpg",
"store_name": "acme",
"username": "ronin666",
"rating": "1",
"ref": "RR,acme,104523"
}
});
</script>
这应返回:
<script language="javascript">
window.commentShare = $.extend(
(window.commentShare || {}), {
375015: {
"review_body": "I bought a Kodak LS443 form My Digital Palace in 2004. I also purchased a 5 year warranty. Now the camera does not work and I am unable to contact them. What do I do??? Am I just screwed???<br><br>Margaret Fuller<br>margaret_fuller@sbcglobal.net",
"link": "http:\/\/www.resellerratings.com\/store\/My_Digital_Palace\/review\/375015#comment375015",
"seller_id": "6930",
"survey_id": "385176",
"loggedin_user": 0,
"store_rating": "1.00",
"store_thumb": "http:\/\/www.resellerratings.com\/store\/thumbnail\/My_Digital_Palace.jpg",
"store_name": "My Digital Palace",
"username": "maf1059",
"rating": "1",
"ref": "RR,My_Digital_Pala,6930"
}
}
);
</script>
window.commentShare=$.extend(
(window.commentShare | |{}){
375015: {
“回顾身体”:“我在2004年从我的数字宫殿买了一台柯达LS443。我还买了5年保修。现在相机坏了,我无法联系他们。我该怎么办???我是不是完蛋了???
玛格丽特·富勒
玛格丽特_fuller@sbcglobal.net",
“链接”:“http:\/\/www.resllerratings.com\/store\/My_Digital_Palace\/review\/375015#comment375015”,
“卖方id”:“6930”,
“调查id”:“385176”,
“loggedin_用户”:0,
“门店评级”:“1.00”,
“store\u thumb”:“http:\/\/www.resellerratings.com\/store\/thumbnail\/My\u Digital\u Palace.jpg”,
“店铺名称”:“我的数字宫殿”,
“用户名”:“maf1059”,
“评级”:“1”,
“参考”:“RR,我的数字宫殿,6930”
}
}
);
如果您的页面上有这个JOSN,并且希望通过javascript访问它,您可以通过window.commentShare对象中的对象循环来实现
这里有一个小测试函数供您添加到页面中,这样您就可以看到它是如何工作的。它将提醒您的一个JSON值。为了完整起见,我将其添加到您的示例的末尾
<script LANGUAGE="javascript">
window.commentShare = $.extend((window.commentShare || {}), {
10607787: {
"review_body": "Beef noodles realism weathered modem tanto hotdog dolphin long-chain hydrocarbons 8-bit euro-pop tank-traps Tokyo narrative.-space j-pop franchise otaku faded RAF girl artisanal hotdog denim ablative systemic smart-Kowloon. Man construct dome smart-computer pen monofilament beef noodles rain garage geodesic bicycle San Francisco wonton soup dissident nodal point tower. Boat uplink film dead man modem warehouse. Nodal point jeans euro-pop render-farm nano-fetishism semiotics hacker gang. Futurity narrative youtube otaku Kowloon free-market drugs. Fluidity assassin Tokyo bicycle media assault concrete industrial grade ablative lights boat BASE jump A.I. post-stimulate carbon. Physical computer narrative city youtube math-neural assassin modem.",
"link": "http:\/\/www.getlost.com\/store\/acme\/review\/10607787#comment10607787",
"seller_id": "104523",
"survey_id": "9933447",
"loggedin_user": 0,
"store_rating": "8.02",
"store_thumb": "http:\/\/www.getlost.com\/store\/thumbnail\/acme.jpg",
"store_name": "acme",
"username": "ronin666",
"rating": "1",
"ref": "RR,acme,104523"
}
});
function test(){
for (var i in window.commentShare) {
var myObj = window.commentShare[i];
alert(myObj.review_body);
}
}
test();
</script>
window.commentShare=$.extend((window.commentShare | |{}){
10607787: {
“审查机构”:"上汤云吞现实主义风情现代TANTO热狗海豚长链碳氢化合物8位欧洲流行坦克陷阱东京叙事-空间J流行音乐特许宅男褪色RAF女孩手工热狗牛仔布烧蚀系统智能九龙男子造圆顶智能电脑笔单丝牛肉面雨衣车测地自行车旧金山馄饨汤ent Node point tower.船上传电影《死人现代仓库》.Node point牛仔裤欧洲流行音乐渲染农场纳米拜物教符号学黑客团伙.未来叙事youtube御宅族九龙自由市场毒品.流动刺客东京自行车媒体攻击混凝土工业级烧蚀灯船基地跳跃A.I.后刺激碳.物理计算机呃叙事城市youtube数学神经刺客调制解调器“,
“链接:”http:\/\/www.getlost.com\/store\/acme\/review\/10607787#comment10607787“,
“卖方id”:“104523”,
“调查id”:“9933447”,
“loggedin_用户”:0,
“门店评级”:“8.02”,
“商店拇指”:“http:\/\/www.getlost.com\/store\/thumbnail\/acme.jpg”,
“门店名称”:“acme”,
“用户名”:“ronin666”,
“评级”:“1”,
“参考”:“RR,acme,104523”
}
});
功能测试(){
for(window.commentShare中的var i){
var myObj=window.commentShare[i];
警惕(myObj.审查机构);
}
}
test();
非常简单,只需去掉包装器和多余的行,就可以得到丰富多彩的JSON本身。下面将删除javscript片段的前四行和最后三行(同时将初始的{
放回丢失的代码中):
如果页面上的
对象不是以统一的方式编写的(也就是说,并不总是前四行和后两行是无关的),您可能需要使用正则表达式或其他匹配。之后,您可以继续访问JSON
json_obj = json.loads(raw)
你的问题只是一个正则表达式/拆分问题。我想大家对Javascript有点反感。:)你是将其视为一个普通的文本文件来解析,还是将其包含在你的web应用程序中?@ergonaut这实际上是我用python中的beautiful soup拼凑的东西。所以我想解析它。为什么您不只是通过变量访问对象吗?多了解一些信息会很有帮助…您想用python?或javascript?is$进行解析。扩展对jquery的调用还是什么?就目前的情况而言,听起来您只是在尝试做一些非常错误的事。
import json
raw = "{" + "\n".join(str(soup.find("script")).split("\n")[4:-3])
json_obj = json.loads(raw)