Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python3.3在两组类之间美化组文本_Python_Beautifulsoup - Fatal编程技术网

Python3.3在两组类之间美化组文本

Python3.3在两组类之间美化组文本,python,beautifulsoup,Python,Beautifulsoup,通过这段代码,我得到的输出是: from bs4 import BeautifulSoup soup = BeautifulSoup(open('text.html')) contain = [] contain = soup.find_all('div',{'class':'day'}) del contain[2::] print (contain) [Idag,byKarl(100),2014-01-14,个人(50)] 您可以这样做: [<div class="day">

通过这段代码,我得到的输出是:

from bs4 import BeautifulSoup 
soup = BeautifulSoup(open('text.html'))
contain = []
contain = soup.find_all('div',{'class':'day'})
del contain[2::]
print (contain)
[Idag,byKarl(100),2014-01-14,个人(50)]
您可以这样做:

[<div class="day"><div class="content">Idag<span id="updatedby">, by<b>Karl</b> (100)</span></div></div>, <div class="day"><div class="content">2014-01-14<span id="updatedby">, by <b>Person</b> (50)</span></div></div>]
从bs4导入美化组
数据=“”
伊达格,卡尔(100)



2014-01-14,按人(50)
''' 汤=美汤(数据) 结果=[] tag=soup.find_all('div',{'class':'day'})[0] 尽管如此: tag=tag.next\u同级 如果标签['class']中的hasattr(标签'class')和'day': 打破 result.append(标记) 对于结果中的e: 打印(e)
结果:

from bs4 import BeautifulSoup

data = '''
<div class="day"><div class="content">Idag<span id='updatedby'>, by <b>Karl</b> (100)     </span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-   Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text1 </div></a><br />   <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a    href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text2 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text3 </div></a><br />  <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a   href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="day"><div class="content">2014-01-14<span id='updatedby'>, by<b>Person</b>  (50)</span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text4 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div> 
'''
soup = BeautifulSoup(data)

result = []
tag = soup.find_all('div', {'class': 'day'})[0]
while True:
    tag = tag.next_sibling
    if hasattr(tag, 'class') and 'day' in tag['class']:
        break
    result.append(tag)
for e in result:
    print(e)




此代码假定您将处理一组同级节点(无嵌套)。它从第一个
class=“day”
div开始,然后逐步遍历同级并将它们附加到结果列表中,直到它到达下一个
class=“day”
div,在这一点上它将中断

@很高兴我能帮上忙。如果这解决了你的问题,考虑“接受”这个答案,点击下面的投票/下投票按钮。
[<div class="day"><div class="content">Idag<span id="updatedby">, by<b>Karl</b> (100)</span></div></div>, <div class="day"><div class="content">2014-01-14<span id="updatedby">, by <b>Person</b> (50)</span></div></div>]
from bs4 import BeautifulSoup

data = '''
<div class="day"><div class="content">Idag<span id='updatedby'>, by <b>Karl</b> (100)     </span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-   Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text1 </div></a><br />   <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a    href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text2 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text3 </div></a><br />  <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a   href="?p=komment&id=xxxxx">18 comments</a></div></div>
<div class="day"><div class="content">2014-01-14<span id='updatedby'>, by<b>Person</b>  (50)</span></div></div>
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img src="img/ikon-Hemsida.gif" class="type" alt="Hemsida" /><div class="text"> Sample text4 </div></a><br /> <div class="sbar"><img src="img/comment.gif" class="comment" alt="Kommentarer" /> <a href="?p=komment&id=xxxxx">18 comments</a></div></div> 
'''
soup = BeautifulSoup(data)

result = []
tag = soup.find_all('div', {'class': 'day'})[0]
while True:
    tag = tag.next_sibling
    if hasattr(tag, 'class') and 'day' in tag['class']:
        break
    result.append(tag)
for e in result:
    print(e)
<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img alt="Hemsida" class="type" src="img/ikon-   Hemsida.gif"/><div class="text"> Sample text1 </div></a><br/> <div class="sbar"><img alt="Kommentarer" class="comment" src="img/comment.gif"/> <a href="?p=komment&amp;id=xxxxx">18 comments</a></div></div>


<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img alt="Hemsida" class="type" src="img/ikon-Hemsida.gif"/><div class="text"> Sample text2 </div></a><br/> <div class="sbar"><img alt="Kommentarer" class="comment" src="img/comment.gif"/> <a href="?p=komment&amp;id=xxxxx">18 comments</a></div></div>


<div class="link"><a href="out.php?id=XXXXXX" target="_blank"><img alt="Hemsida" class="type" src="img/ikon-Hemsida.gif"/><div class="text"> Sample text3 </div></a><br/> <div class="sbar"><img alt="Kommentarer" class="comment" src="img/comment.gif"/> <a href="?p=komment&amp;id=xxxxx">18 comments</a></div></div>