Python 正在尝试替换标签<;em>;与<;a>; 导入请求 导入字符串 从bs4导入美化组,标记 [...] def disease_spider(最大页面): i=0 从我的理解来看,我是
,但你想用它的文本替换Python 正在尝试替换标签<;em>;与<;a>; 导入请求 导入字符串 从bs4导入美化组,标记 [...] def disease_spider(最大页面): i=0 从我的理解来看,我是,python,tags,beautifulsoup,replacewith,Python,Tags,Beautifulsoup,Replacewith,,但你想用它的文本替换em 换句话说,a元素包含: import requests import string from bs4 import BeautifulSoup, Tag [...] def disease_spider(maxpages): i = 0 while i <= maxpages: url = 'http://www.cdc.gov/DiseasesConditions/az/'+ alpha[i]+'.html' source_code =
em
换句话说,a
元素包含:
import requests
import string
from bs4 import BeautifulSoup, Tag
[...]
def disease_spider(maxpages):
i = 0
while i <= maxpages:
url = 'http://www.cdc.gov/DiseasesConditions/az/'+ alpha[i]+'.html'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for l in soup.findAll('a', {'class':'noLinking'}):
x =l.find("em")
if x is not None:
return x.em.replaceWith(Tag('a'))
i += 1
作为旁注,可能不需要进行替换,因为
a
标记的.text
将为您提供节点的全文,包括其子节点:
for em in soup.select('a.noLinking > em'):
em.replace_with(em.text)
[1]中的:从bs4导入BeautifulSoup
在[2]:data=“”
...:
...: """
[3]中:soup=BeautifulSoup(数据)
在[4]中:打印soup.a.text
包括Hib感染(流感嗜血杆菌感染)
是否可以在列表标记中找到所有带有的标记?@ks4929是。例如,将a.noLinking>em
替换为li a.noLinking>em
。
<a class="noLinking" href="http://www.cdc.gov/hi-disease/index.html">
including Hib Infection (Haemophilus influenzae Infection)
</a>
for em in soup.select('a.noLinking > em'):
em.replace_with(em.text)
In [1]: from bs4 import BeautifulSoup
In [2]: data = """
...: <a class="noLinking" href="http://www.cdc.gov/hi-disease/index.html">
...: including Hib Infection (<em>Haemophilus influenzae</em> Infection)
...: </a>
...: """
In [3]: soup = BeautifulSoup(data)
In [4]: print soup.a.text
including Hib Infection (Haemophilus influenzae Infection)