Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby-on-rails-4/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 Beautifulsoup用不同的代码替换html代码集_Python 2.7_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 2.7 Beautifulsoup用不同的代码替换html代码集

Python 2.7 Beautifulsoup用不同的代码替换html代码集,python-2.7,web-scraping,beautifulsoup,Python 2.7,Web Scraping,Beautifulsoup,我的beautifulsoup对象中有一组html代码,将用其他代码替换 这就是我在我的美丽之旅中得到的东西 <html> <body> <table class="bt" width="100%"> <tr class="heading"> <th scope="col"> </th> <th class="th-heading" scope="col">B</th> <th class="

我的beautifulsoup对象中有一组html代码,将用其他代码替换

这就是我在我的美丽之旅中得到的东西

<html>
<body>
<table class="bt" width="100%">
<tr class="heading">
<th scope="col"> </th>
<th class="th-heading" scope="col">B</th>
<th class="tho" scope="col"><b>O</b></th></tr></table></div></div></div></div></div></div></body></html></html>
<th class="thm" scope="col"><b>M</b></th>
<th class="thr" scope="col"><b>R</b></th>
<th class="thw" scope="col"><b>W</b></th>
<th class="thecon" scope="col"><b>E</b></th>
<th class="thw" scope="col"><b>0s</b></th>
<th class="thw" scope="col"><b>F</b></th>
<th class="thw" scope="col"><b>S</b></th>
<th scope="col"> </th>.............</body></html>

 
B
O
M
R
W
E
0
F
s
 .............
所需代码:

<html>
<body>
<table class="bt" width="100%">
<tr class="heading">
<th scope="col"> </th>
<th class="th-heading" scope="col">B</th>
<th class="tho" scope="col"><b>O</b></th>
<th class="thm" scope="col"><b>M</b></th>
<th class="thr" scope="col"><b>R</b></th>
<th class="thw" scope="col"><b>W</b></th>
<th class="thecon" scope="col"><b>E</b></th>
<th class="thw" scope="col"><b>0s</b></th>
<th class="thw" scope="col"><b>F</b></th>
<th class="thw" scope="col"><b>S</b></th>
<th scope="col"> </th>.............</body></html>

 
B
O
M
R
W
E
0
F
s
 .............
我试过了,但没用

soup.replace('<th class="tho" scope="col"><b>O</b></th></tr></table></div></div></div></div></div></div></body></html></html>', '<th class="tho" scope="col"><b>O</b></th>')
soup.replace('O','O'))

在您自己的解决方案中,您已经在暗示字符串替换,而不是 实际的HTML树插入。这是因为你开始使用的HTML非常糟糕

一种解决方案是将标签添加到由BeautifulSoup生成的原始树中:

from bs4 import BeautifulSoup
import re

start_str = """<html><body><table class="bt" width="100%"><tr class="heading"><th scope="col">Â </th>
<th class="th-heading" scope="col">B</th>
<th class="tho" scope="col"><b>O</b></th></tr></table></div></div></div></div></div></div></body></html></html>
<th class="thm" scope="col"><b>M</b></th>
<th class="thr" scope="col"><b>R</b></th>
<th class="thw" scope="col"><b>W</b></th>
<th class="thecon" scope="col"><b>E</b></th>
<th class="thw" scope="col"><b>0s</b></th>
<th class="thw" scope="col"><b>F</b></th>
<th class="thw" scope="col"><b>S</b></th>
<th scope="col">Â </th>.............</body></html>"""
soup = BeautifulSoup(start_str) # remark: this'll split right after the first '</html>'
substr = re.findall('<th class="thm".*', start_str, re.DOTALL)
subsoup = BeautifulSoup(substr[0])
for tag in subsoup.findAll('th'):
    soup.tr.append(tag)
从bs4导入美化组
进口稀土
开始_str=“””
B
O
M
R
W
E
0
F
s
 ............."""
汤=漂亮的汤(开始)#备注:这将在第一个“汤”后立即分开

substr=re.findall(“这是一个非常糟糕的HTML开始。BeautifulSoup会将它裁剪为它找到的第一个HTML。但是,你想要的输出也不理想:你缺少
tr
table
的结束标记。或者我们可以假设它包含在点中吗?是的。它包含在点中。唯一的问题是它添加了不需要的c丢失代码第7行的divs和html标记。我只需要删除它并替换为上面提到的所需代码。谢谢回复。
substr = start_str.split('</html></html>')[1]
to_remove = '</tr></table></div></div></div></div></div></div></body></html></html>'
soup = BeautifulSoup(''.join(start_str.split(to_remove)))