如何使用python从脚本标记中提取json?

如何使用python从脚本标记中提取json?,python,html,json,web-scraping,beautifulsoup,Python,Html,Json,Web Scraping,Beautifulsoup,我想使用beautiful soup从脚本标记中提取reviewCount。尝试了不同的方法,但没有成功 <script type="application/json" data-initial-state="review-filter"> {"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"franç

我想使用beautiful soup从脚本标记中提取
reviewCount
。尝试了不同的方法,但没有成功

<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>

{“languages”:[{“isoCode”:“all”,“displayName”:“Toutes les languages”,“reviewCount”:“573”},{“isoCode”:“fr”,“displayName”:“français”,“reviewCount”:“567”},{“isoCode”:“en”,“displayName”:“English”,“reviewCount”:“6”}],“selectedLanguages”:“all”;“selectedStars”:null,“selectedLocationId”:null}

这应该行得通,我绝对相信有一种更优雅的方法:

导入json
从bs4导入BeautifulSoup
html=“”
{“languages”:[{“isoCode”:“all”,“displayName”:“Toutes les languages”,“reviewCount”:“573”},{“isoCode”:“fr”,“displayName”:“français”,“reviewCount”:“567”},{“isoCode”:“en”,“displayName”:“English”,“reviewCount”:“6”}],“selectedLanguages”:“all”;“selectedStars”:null,“selectedLocationId”:null}
'''
soup=BeautifulSoup(html,'html.parser')
res=soup.find('script')
json_object=json.load(res.contents[0])
对于json_对象['languages']中的语言:
打印(“{}:{}.”格式(语言['displayName'],语言['reviewCount']))
输出:

Toutes les langues: 573
français: 567
English: 6

导入json并将数据加载到
json
中,然后通过iterarte获取所有
reviewCount

import json
html='''<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>'''

soup=BeautifulSoup(html,"html.parser")
item=soup.select_one('script[data-initial-state="review-filter"]').text
jsondata=json.loads(item)
for item in jsondata['languages']:
    print(item['reviewCount'])

谢谢你,詹姆斯。我试过你上面提到的方法。我的主要问题是获取reviewCount编号。TypeError:类型为“Response”的对象没有len()尝试了不同的方法,但没有成功。你能分享这些尝试吗?从您共享的标记中,您似乎只需要获取标记的内容并解析结果。如果您正在努力从元素中提取内容,则这是的副本。如果问题是解析JSON,则这是的副本。
573
567
6
import re

html = '''<script type="application/json" data-initial-state="review-filter">
{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}
</script>'''


match = [item.group(1) for item in re.finditer('reviewCount":"(.+?)"', html)]

print(match)
['573', '567', '6']