json值是一个html字符串-如何在python中解析它?
我有这样一个JSON文件:json值是一个html字符串-如何在python中解析它?,python,html,json,Python,Html,Json,我有这样一个JSON文件: { "entryLabel": "cat", "entryContent": "<div class=\"entry_container\"><div class=\"entry lang_en-gb\" id=\"cat_1\"><span class=\"inline\"><h1 class=\"hwd\">cat<\/h1><span> [<\/span><
{
"entryLabel": "cat",
"entryContent": "<div class=\"entry_container\"><div class=\"entry lang_en-gb\" id=\"cat_1\"><span class=\"inline\"><h1 class=\"hwd\">cat<\/h1><span> [<\/span><span class=\"pron\" type=\"\">ˈkæt<a href=\"#\" class=\"playback\"><img src=\"https://api.collinsdictionary.com/external/images/redspeaker.gif?version=2013-10-30-1535\" alt=\"Pronunciation for cat\" class=\"sound\" title=\"Pronunciation for cat\" style=\"cursor: pointer\"/><\/a><audio type=\"pronunciation\" title=\"cat\"><source type=\"audio/mpeg\" src=\"https://api.collinsdictionary.com/media/sounds/sounds/0/081/08189/08189.mp3\"/>Your browser does not support HTML5 audio.<\/audio><\/span><span>]<\/span><\/span><div class=\"hom\" id=\"cat_1.1\"><span> <\/span><span class=\"gramGrp\"><span class=\"pos\">noun<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"bold\">1 <\/span><span class=\"lbl\"><span>(<\/span>domestic<span>)<\/span><\/span><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">chat <em class=\"hi\">m<\/em><\/span><\/span><span class=\"cit\" id=\"cat_1.2\"><span>; <\/span><span class=\"quote\">Have you got a cat?<\/span><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">Est-ce que tu as un chat?<\/span><\/span><\/span><span class=\"re\" id=\"cat_1.3\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to let the cat out of the bag<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">vendre la mèche<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.4\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">curiosity killed the cat<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">la curiosité est toujours punie<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.5\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to look like sth the cat dragged in<\/span><\/span><span class=\"inline\"><span>, <\/span><span class=\"orth\">to look like sth the cat brought in<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">être dans un état lamentable<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.6\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to play cat and mouse with sb<\/span><\/span><span class=\"inline\"><span>, <\/span><span class=\"orth\">to play a game of cat and mouse with sb<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">jouer au chat et à la souris avec qn<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.7\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to put the cat among the pigeons<\/span><\/span><span class=\"inline\"><span>, <\/span><span class=\"orth\">to set the cat among the pigeons<\/span><\/span><span class=\"lbl\"><span> (<\/span>British<span>)<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">jeter un pavé dans la mare<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.8\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">there's no room to swing a cat<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">on ne peut pas se tourner<\/span><\/span><\/div><!-- End of DIV sense--><\/span><\/div><!-- End of DIV sense--><div class=\"sense\"><span> <br/><\/span><span class=\"bold\">2 <\/span><span class=\"lbl\"><span>(= <\/span>big cat<span>)<\/span><\/span><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">félin <em class=\"hi\">m<\/em><\/span><\/span><span class=\"cit\" id=\"cat_1.9\"><span>; <\/span><\/span><\/div><!-- End of DIV sense--><\/div><!-- End of DIV hom--><\/div><!-- End of DIV entry lang_en-gb--><\/div><!-- End of DIV entry_container-->\n"
}
最后,我需要从HTML中获得这个spankæt
的值;源元素的src值
;
span class pos名词的值
;
以及div元素提供的所有感官
<div class="sense">
<span> <br/></span>
<span class="bold">2 </span><span class="lbl"><span>(= </span>big cat<span>)</span></span><span> </span><span class="cit lang_fr"><span class="quote">félin <em class="hi">m</em></span></span><span class="cit" id="cat_1.9"><span>; </span></span>
</div>
2(=大猫)félin m;
尝试使用:
你可以用beautifulsoup或scrapy来解析它-你想从html中得到什么?我已经编辑了最初的帖子,看起来很不错。现在,我可以用BeautifulSoup解析html了?
<div class="sense">
<span> <br/></span>
<span class="bold">2 </span><span class="lbl"><span>(= </span>big cat<span>)</span></span><span> </span><span class="cit lang_fr"><span class="quote">félin <em class="hi">m</em></span></span><span class="cit" id="cat_1.9"><span>; </span></span>
</div>
import json
from bs4 import BeautifulSoup
# json_data=open('cat.json')
# data = json.load(json_data)
# using json.load and the 'with' context (to close file when not needed...)
with open('cat.json') as f:
json_data = json.load(f)
print data["dictionaryCode"]
print data["entryLabel"]
entryContentHTML = BeautifulSoup(data["entryContent"])
print entryContentHTML.prettify()
# json_data.close()