Python 2.7 使用beautifulsoup将html代码拆分为所需格式_Python 2.7_Beautifulsoup

Python 2.7 使用beautifulsoup将html代码拆分为所需格式

python-2.7

Python 2.7 使用beautifulsoup将html代码拆分为所需格式,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我有一个HTML代码段，如下所示： <div class="myTestCode"> <strong>Abc: </strong> test1</br> <strong>Def: </strong> test2</br> </div> 这就是我迄今为止所尝试的： data = """<div class="myTestCode"> <strong>Abc: </stro

我有一个HTML代码段，如下所示：

<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>

这就是我迄今为止所尝试的：

data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)
link = temp.select('.myTestCode')

#both didn't print the expected output as mentioned above
print str(link).split('<strong>')
print ''.join(link.stripped_strings)

一种可能的办法：

from bs4 import BeautifulSoup

data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)

#get individual <strong> elements
strongs = temp.select('.myTestCode > strong')

#map each <strong> element to it's text content concatenated with the text node that follow
result = map(lambda x: x.text + x.nextSibling.strip(), strongs)

#join all separated by comma and print
print ', '.join(result)

#print output:
#Abc: test1, Def: test2

我还尝试了strlink.split.joinlink.stripped_字符串，其中link=temp.select'.myTestCode'。temp=使用代码在帖子中添加数据。

from bs4 import BeautifulSoup

data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)

#get individual <strong> elements
strongs = temp.select('.myTestCode > strong')

#map each <strong> element to it's text content concatenated with the text node that follow
result = map(lambda x: x.text + x.nextSibling.strip(), strongs)

#join all separated by comma and print
print ', '.join(result)

#print output:
#Abc: test1, Def: test2