Python BeautifulSoup查找“内部内容”；（[{}]}"；这是我的html文件。_Python_Html_Beautifulsoup_Web Crawler

Python BeautifulSoup查找“内部内容”；（[{}]}"；这是我的html文件。

python html web-crawler

Python BeautifulSoup查找“内部内容”；（[{}]}"；这是我的html文件。,python,html,beautifulsoup,web-crawler,Python,Html,Beautifulsoup,Web Crawler,我可以在2中获取值更重要的是，我想从soup.script部分获取所有“city”“g” 城市：区域名称 g：[“41.7089”，“123.439”]经纬度我如何才能做到这一点？希望您能提供帮助！不幸的是，您将不得不付出艰苦的努力，包括手动解析BeautifulSoup试图远离您的内容。但是，对于您来说，这很容易：使用BeautifulSoup获取标记的内部文本在该字符串中查找mapInitWithData（）的位置还可以找到]}]的位置在第一个字符串之后剪切所有内容，直到包

我可以在

中获取值

更重要的是，我想从

soup.script

部分获取所有

“city”

“g”

```
城市
```
：区域名称
```
g
```
：[“41.7089”，“123.439”]经纬度

我如何才能做到这一点？希望您能提供帮助！

不幸的是，您将不得不付出艰苦的努力，包括手动解析BeautifulSoup试图远离您的内容。但是，对于您来说，这很容易：

使用BeautifulSoup获取
标记的内部文本
在该字符串中查找
```
mapInitWithData（
```
）的位置
还可以找到
```
]}]
```
的位置
在第一个字符串之后剪切所有内容，直到包括第二个字符串
使用
```
json.loads（）
```
解析json
你需要什么就从字典里拿什么

听起来很难看？不太难看。网页抓取总是一种启发式方法，无论你是依赖HTML文档的结构还是JavaScript函数的代码结构，都没有多大区别。当网站所有者决定更改网站时，你无论如何都必须返工

为lulz编码：

from bs4 import BeautifulSoup
import json

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<td id="cur_o3" class="tdcur" style="font-weight:bold;font-size:11px;" align="center">2</td>
<script type="text/javascript">
try {
if (isMapOpened == "open") {
mapInitWithData([...]}]/* 24 points -> 24 points */);
}}
</script>
"""

soup= BeautifulSoup(html_doc, "html.parser")
# usually `try` that but for the moment we let it raise
js = soup.find("script").get_text()
assert len(js) > 0
# here the markers for start and end  of json
from_ = "mapInitWithData("
to_ = "]}]"
index_from = js.find(from_)
assert index_from > 0
index_to = js.find(to_)
assert index_to > 0
j = js[index_from+len(from_):index_to+len(to_)]
data = json.loads(j)
for row in data:
    print row["city"], ":", [float(c) for c in row["g"]] # <g>

从bs4导入美化组
导入json
html_doc=“”
睡鼠的故事
2.
试一试{
如果（IsMappened==“打开”）{
mapInitWithData（[…]}]/*24点->24点*/）；
}}
"""
soup=BeautifulSoup（html\u doc，“html.parser”）
#通常是“试试”，但现在我们让它上升
js=soup.find（“脚本”）.get_text（）
断言len（js）>0
#这里是json的开始和结束标记
from=“mapInitWithData（”
至「]}]“
index_from=js.find（from）
从>0断言索引_
索引到=js.find（到）
将索引_断言为>0
j=js[index_from+len（from）:index_to+len（to）]
data=json.loads（j）
对于数据中的行：
打印第[“城市”]、“：”、[第[“g”]行中c的浮动（c）]#

不幸的是，您将不得不付出艰苦的努力，包括手动解析BeautifulSoup试图远离您。但是，对于您来说，这很容易：

使用BeautifulSoup获取
标记的内部文本
在该字符串中查找
```
mapInitWithData（
```
）的位置
还可以找到
```
]}]
```
的位置
在第一个字符串之后剪切所有内容，直到包括第二个字符串
使用
```
json.loads（）
```
解析json
你需要什么就从字典里拿什么

为lulz编码：

from bs4 import BeautifulSoup
import json

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<td id="cur_o3" class="tdcur" style="font-weight:bold;font-size:11px;" align="center">2</td>
<script type="text/javascript">
try {
if (isMapOpened == "open") {
mapInitWithData([...]}]/* 24 points -> 24 points */);
}}
</script>
"""

soup= BeautifulSoup(html_doc, "html.parser")
# usually `try` that but for the moment we let it raise
js = soup.find("script").get_text()
assert len(js) > 0
# here the markers for start and end  of json
from_ = "mapInitWithData("
to_ = "]}]"
index_from = js.find(from_)
assert index_from > 0
index_to = js.find(to_)
assert index_to > 0
j = js[index_from+len(from_):index_to+len(to_)]
data = json.loads(j)
for row in data:
    print row["city"], ":", [float(c) for c in row["g"]] # <g>

从bs4导入美化组
导入json
html_doc=“”
睡鼠的故事
2.
试一试{
如果（IsMappened==“打开”）{
mapInitWithData（[…]}]/*24点->24点*/）；
}}
"""
soup=BeautifulSoup（html\u doc，“html.parser”）
#通常是“试试”，但现在我们让它上升
js=soup.find（“脚本”）.get_text（）
断言len（js）>0
#这里是json的开始和结束标记
from=“mapInitWithData（”
至「]}]“
index_from=js.find（from）
从>0断言索引_
索引到=js.find（到）
将索引_断言为>0
j=js[index_from+len（from）:index_to+len（to）]
data=json.loads（j）
对于数据中的行：
打印第[“城市”]、“：”、[第[“g”]行中c的浮动（c）]#

您可以使用正则表达式提取数据：

from bs4 import BeautifulSoup
import re
import json

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<td id="cur_o3" class="tdcur" style="font-weight:bold;font-size:11px;" align="center">2</td>
</script><script type="text/javascript">
try { 
if (isMapOpened == "open") {
mapInitWithData([{"aqi":"294","city":"D\u014dngru\u01cen, Shenyang","x":1249,"g":["41.7089","123.439"]},{"aqi":"263","city":"Liaoyang","extra":1,"x":4347,"g":["41.267244","123.236944"]},{"aqi":"263","city":"Ch\u00e9nli\u00e1ox\u012b l\u00f9, Shenyang","x":8755,"g":["41.7347","123.2444"]},{"aqi":"255","city":"Tieling","extra":1,"x":4346,"g":["42.22297","123.726163"]},{"aqi":"249","city":"h\u00fan n\u00e1n d\u014dng l\u00f9, Shenyang , Shenyang","x":5218,"g":["41.7561","123.535"]},{"aqi":"238","city":"Shenyang US Consulate","lvl":1,"x":496,"g":["41.7832349","123.4267266"]},{"aqi":"238","city":"Xiaoheyan, Shenyang","x":1254,"g":["41.7775","123.478"]},{"aqi":"219","city":"Liaoning University, Shenyang","x":1257,"g":["41.9228","123.3783"]},{"aqi":"193","city":"wenhua street, Shenyang , Shenyang","x":5215,"g":["41.765","123.41"]},{"aqi":"191","city":"Shenyang","x":1473,"g":["41.805698","123.431475"]},{"aqi":"191","city":"Taiyuan Street, Shenyang","x":1255,"g":["41.7972","123.3997"]},{"aqi":"189","city":"Shenfu new town, Fushun","x":4355,"g":["41.8417","123.7117"]},{"aqi":"188","city":"Wanghua district, Fushun , Fushun","extra":1,"x":5240,"g":["41.8469","123.8100"]},{"aqi":"188","city":"Fushun","extra":1,"x":1476,"g":["41.880872","123.957208"]},{"aqi":"188","city":"j\u012bnsh\u0101 ji\u0101ng l\u00f9 b\u011bi, Tieling , Tieling","extra":1,"x":5203,"g":["42.2217","123.7153"]},{"aqi":"182","city":"Tanglin Road , Shenyang , Shenyang","x":5216,"g":["41.8336","123.542"]},{"aqi":"179","city":"Caitun, Benxi","extra":1,"x":4364,"g":["41.3047","123.7308"]},{"aqi":"176","city":"Xihu, Benxi","extra":1,"x":4365,"g":["41.3369","123.7528"]},{"aqi":"172","city":"Xinfu district, Fushun , Fushun","extra":1,"x":5237,"g":["41.8594","123.9000"]},{"aqi":"170","city":"Weining, Benxi","extra":1,"x":4361,"g":["41.3472","123.8142"]},{"aqi":"162","city":"Shuncheng district, Fushun , Fushun","extra":1,"x":5239,"g":["41.883375","123.94504"]},{"aqi":"161","city":"y\u00f9n\u00f3ng l\u00f9, Shenyang","x":8756,"g":["41.9086","123.5953"]},{"aqi":"151","city":"Dongzhou district, Fushun , Fushun","extra":1,"x":5238,"g":["41.8625","124.0383"]},{"aqi":"122","city":"Dahuofang reservoir, Fushun , Fushun","extra":1,"x":5236,"g":["41.8864","124.0878"]}]/* 24 points -> 24 points */); 

"""  

soup = BeautifulSoup(html_doc, 'lxml')
script = soup.script.get_text()
map_search = re.search('mapInitWithData\((.*)\/\*.*', script)
mapData = map_search.group(1)
mapDataObj = json.loads(mapData)[0]
print mapDataObj["city"]
print mapDataObj["g"]

从bs4导入美化组
进口稀土
导入json
html_doc=“”
睡鼠的故事
2.
试试{
如果（IsMappened==“打开”）{
地图数据（[{“aqi”：“294”，“城市”：“D\u014dngru\u01cen，沈阳”，“x”：1249，“g”：[“41.7089”，“123.439”]}，{“aqi”：“263”，“城市”：“辽阳”，“额外的”：1，“x”：4347，“g”：“41.267244”，“123.236944”]，{“aqi”：“263”，“城市”：“Ch\u00e9nli\u00e1ox\u012b l\u00f9，沈阳”，“x”：8755，“g”：“辽阳”，“额外的”：1.267244”，“123.236944”；“铁岭市”：“aqi”：42.22297，“123.726163”}，{“aqi”：“249”，“城市”：“h\u00fan n\u00e1n d\u014ng l\u00f9，沈阳”，“x”：5218，“g”：[“41.7561”，“123.535”}，{“aqi”：“238”，“城市”：“沈阳美国领事馆”，“lvl”：1，“x”：496，“g”：[“41.7832349”，“123.4267266”}，{“aqi”：“城市”：“小河岩，沈阳”，“x”：1254”，“g”：“478”，“aqi”：辽宁大学，沈阳，“x”：1257，“g”：[“41.9228”，“123.3783”]，{“aqi”：“193”，“城市”：“沈阳市文化街”，“x”：5215，“g”：[“41.765”，“123.41”]，{“aqi”：“191”，“城市”：“沈阳”，“x”：1473，“g”：[“41.805698”，“123.431475”}，{“aqi”：“191”，“城市”：“沈阳市太原街”，“x”：1255，“g”：“41.7972”，“123.3997”}，{“aqi”：“189抚顺市神府新城，x:4355，g:[“41.8417”，“123.7117”]，{“aqi”：“188”，“市”：“抚顺市抚顺市望华区”，“额外”：1，“x:5240，”g:[“41.8469”，“123.8100”}，{“aqi”：“188”，“市”：“抚顺市”，“额外”：1，“x:1476”，“g”：[“41.88080872”，“123.957208”}，{“aqi”：“188”，“市”：“j\U012SH\u0101\BN0101NG\extra”，铁岭，铁岭“：5203，“g”：[“42.2217”，“123.7153”]，{“aqi”：“182”，“市”：“沈阳市唐林路”，“x”：5216，“g”：[“41.8336”，“123.542”]，{“aqi”：“179”，“市”：“本溪彩屯”，“额外的”：1，“x”：4364，“g”：[“41.3047”，“123.7308”]，{“aqi”：“176”，“市”：“西湖，本溪”，“额外的”：1，“x”：4365，“g”：“41.3369”，“抚顺市”；“抚顺区”{，“额外”：1，“x”：5237，“g”：[“41.8594”，“123.9000”]，{“aqi”：“170”，“城市”：“威宁，本溪”，“额外”：1，“x”：4361，“g”：[“41.3472”，“123.8142”]，{“aqi”：“162”，“城市”：“抚顺市顺城区”，“额外”：1，“x”：5239，“g”：[“41.883375”，“123.94504”]，{“aqi”：“161”，“城市”：“y\u00f9n\u00f9 l\u00f9”，“抚顺市”；“5986”城市“：“抚顺市抚顺市东洲区”，“额外”：1，“x”：5238，“g”：[“41.8625”，“124.0383”]}，{“aqi”：“122”，“城市“：“抚顺市抚顺市大伙房水库”，“额外”：1，“x”：5236，“g”：[“41.8864”，“124.0878”}]/*24分->24分*/）；
"""  
汤=美酒
from bs4 import BeautifulSoup
import re
import json

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<td id="cur_o3" class="tdcur" style="font-weight:bold;font-size:11px;" align="center">2</td>
</script><script type="text/javascript">
try { 
if (isMapOpened == "open") {
mapInitWithData([{"aqi":"294","city":"D\u014dngru\u01cen, Shenyang","x":1249,"g":["41.7089","123.439"]},{"aqi":"263","city":"Liaoyang","extra":1,"x":4347,"g":["41.267244","123.236944"]},{"aqi":"263","city":"Ch\u00e9nli\u00e1ox\u012b l\u00f9, Shenyang","x":8755,"g":["41.7347","123.2444"]},{"aqi":"255","city":"Tieling","extra":1,"x":4346,"g":["42.22297","123.726163"]},{"aqi":"249","city":"h\u00fan n\u00e1n d\u014dng l\u00f9, Shenyang , Shenyang","x":5218,"g":["41.7561","123.535"]},{"aqi":"238","city":"Shenyang US Consulate","lvl":1,"x":496,"g":["41.7832349","123.4267266"]},{"aqi":"238","city":"Xiaoheyan, Shenyang","x":1254,"g":["41.7775","123.478"]},{"aqi":"219","city":"Liaoning University, Shenyang","x":1257,"g":["41.9228","123.3783"]},{"aqi":"193","city":"wenhua street, Shenyang , Shenyang","x":5215,"g":["41.765","123.41"]},{"aqi":"191","city":"Shenyang","x":1473,"g":["41.805698","123.431475"]},{"aqi":"191","city":"Taiyuan Street, Shenyang","x":1255,"g":["41.7972","123.3997"]},{"aqi":"189","city":"Shenfu new town, Fushun","x":4355,"g":["41.8417","123.7117"]},{"aqi":"188","city":"Wanghua district, Fushun , Fushun","extra":1,"x":5240,"g":["41.8469","123.8100"]},{"aqi":"188","city":"Fushun","extra":1,"x":1476,"g":["41.880872","123.957208"]},{"aqi":"188","city":"j\u012bnsh\u0101 ji\u0101ng l\u00f9 b\u011bi, Tieling , Tieling","extra":1,"x":5203,"g":["42.2217","123.7153"]},{"aqi":"182","city":"Tanglin Road , Shenyang , Shenyang","x":5216,"g":["41.8336","123.542"]},{"aqi":"179","city":"Caitun, Benxi","extra":1,"x":4364,"g":["41.3047","123.7308"]},{"aqi":"176","city":"Xihu, Benxi","extra":1,"x":4365,"g":["41.3369","123.7528"]},{"aqi":"172","city":"Xinfu district, Fushun , Fushun","extra":1,"x":5237,"g":["41.8594","123.9000"]},{"aqi":"170","city":"Weining, Benxi","extra":1,"x":4361,"g":["41.3472","123.8142"]},{"aqi":"162","city":"Shuncheng district, Fushun , Fushun","extra":1,"x":5239,"g":["41.883375","123.94504"]},{"aqi":"161","city":"y\u00f9n\u00f3ng l\u00f9, Shenyang","x":8756,"g":["41.9086","123.5953"]},{"aqi":"151","city":"Dongzhou district, Fushun , Fushun","extra":1,"x":5238,"g":["41.8625","124.0383"]},{"aqi":"122","city":"Dahuofang reservoir, Fushun , Fushun","extra":1,"x":5236,"g":["41.8864","124.0878"]}]/* 24 points -> 24 points */); 

"""  

soup = BeautifulSoup(html_doc, 'lxml')
script = soup.script.get_text()
map_search = re.search('mapInitWithData\((.*)\/\*.*', script)
mapData = map_search.group(1)
mapDataObj = json.loads(mapData)[0]
print mapDataObj["city"]
print mapDataObj["g"]