Python3.3在两组类之间美化组文本
通过这段代码,我得到的输出是:Python3.3在两组类之间美化组文本,python,beautifulsoup,Python,Beautifulsoup,通过这段代码,我得到的输出是: from bs4 import BeautifulSoup soup = BeautifulSoup(open('text.html')) contain = [] contain = soup.find_all('div',{'class':'day'}) del contain[2::] print (contain) [Idag,byKarl(100),2014-01-14,个人(50)] 您可以这样做: [<div class="day">
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('text.html'))
contain = []
contain = soup.find_all('div',{'class':'day'})
del contain[2::]
print (contain)
[Idag,byKarl(100),2014-01-14,个人(50)]
您可以这样做:
[<div class="day"><div class="content">Idag<span id="updatedby">, by<b>Karl</b> (100)</span></div></div>, <div class="day"><div class="content">2014-01-14<span id="updatedby">, by <b>Person</b> (50)</span></div></div>]
从bs4导入美化组
数据=“”
伊达格,卡尔(100)
2014-01-14,按人(50)
'''
汤=美汤(数据)
结果=[]
tag=soup.find_all('div',{'class':'day'})[0]
尽管如此:
tag=tag.next\u同级
如果标签['class']中的hasattr(标签'class')和'day':
打破
result.append(标记)
对于结果中的e:
打印(e)
结果:
from bs4 import BeautifulSoup
data = '''
<div class="day"><div class="content">Idag<span id='updatedby'>, by <b>Karl</b> (100) </span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon- Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text1 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text2 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text3 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="day"><div class="content">2014-01-14<span id='updatedby'>, by<b>Person</b> (50)</span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text4 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
'''
soup = BeautifulSoup(data)
result = []
tag = soup.find_all('div', {'class': 'day'})[0]
while True:
tag = tag.next_sibling
if hasattr(tag, 'class') and 'day' in tag['class']:
break
result.append(tag)
for e in result:
print(e)
此代码假定您将处理一组同级节点(无嵌套)。它从第一个
class=“day”
div开始,然后逐步遍历同级并将它们附加到结果列表中,直到它到达下一个class=“day”
div,在这一点上它将中断 @很高兴我能帮上忙。如果这解决了你的问题,考虑“接受”这个答案,点击下面的投票/下投票按钮。
[<div class="day"><div class="content">Idag<span id="updatedby">, by<b>Karl</b> (100)</span></div></div>, <div class="day"><div class="content">2014-01-14<span id="updatedby">, by <b>Person</b> (50)</span></div></div>]
from bs4 import BeautifulSoup
data = '''
<div class="day"><div class="content">Idag<span id='updatedby'>, by <b>Karl</b> (100) </span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon- Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text1 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text2 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text3 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="day"><div class="content">2014-01-14<span id='updatedby'>, by<b>Person</b> (50)</span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text4 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
'''
soup = BeautifulSoup(data)
result = []
tag = soup.find_all('div', {'class': 'day'})[0]
while True:
tag = tag.next_sibling
if hasattr(tag, 'class') and 'day' in tag['class']:
break
result.append(tag)
for e in result:
print(e)
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img alt="Hemsida" class="type" src="img/ikon- Hemsida.gif"/><div class="text"> Sample text1 </div></a><br/> <div class="sbar"><img alt="Kommentarer" class="comment" src="img/comment.gif"/> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img alt="Hemsida" class="type" src="img/ikon-Hemsida.gif"/><div class="text"> Sample text2 </div></a><br/> <div class="sbar"><img alt="Kommentarer" class="comment" src="img/comment.gif"/> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img alt="Hemsida" class="type" src="img/ikon-Hemsida.gif"/><div class="text"> Sample text3 </div></a><br/> <div class="sbar"><img alt="Kommentarer" class="comment" src="img/comment.gif"/> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>