Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python BS4中提取特定文本?_Python_Html_Beautifulsoup - Fatal编程技术网

在python BS4中提取特定文本?

在python BS4中提取特定文本?,python,html,beautifulsoup,Python,Html,Beautifulsoup,我试图提取BS4中的某些文本。下面是示例HTML 2. 如果我理解正确,您希望提取3个分隔的字符串。您可以将.get_text()与自定义的分隔符=字符一起使用,然后在此字符上拆分: from bs4 import BeautifulSoup txt = ''' <tr id="_Gonzaga" class="seedrow"> <td title="Click to show/hide ranks" cla

我试图提取BS4中的某些文本。下面是示例HTML


2.

如果我理解正确,您希望提取3个分隔的字符串。您可以将
.get_text()
与自定义的
分隔符=
字符一起使用,然后在此字符上拆分:

from bs4 import BeautifulSoup


txt = '''
<tr id="_Gonzaga" class="seedrow">
<td title="Click to show/hide ranks" class='lowrowclick' style="text-align:center;font-size:8px">2</td>
<td  id='Gonzaga' class="teamname"><a href="team.php?team=Gonzaga&year=2019" style="text-decoration: none;">Gonzaga<span class="lowrow" style="font-size:10px"><br/>&nbsp;&nbsp;&nbsp;1 seed, <span style='background-color:#BAE2C6'>Elite Eight</span></span></a></td>
</tr>'''

soup = BeautifulSoup(txt, 'html.parser')
data = soup.findAll('tr', attrs={"class": "seedrow"})

for item in data:
    team_name = item.find('td', class_ = 'teamname')

    a, b, c = team_name.get_text(strip=True, separator='|').split('|')

    print(a)
    print(b.strip(','))
    print(c)

如果您不喜欢
\xa0
,可以
.strip('\xa0')
.replace('\xa0','')
Gonzaga
1 seed
Elite Eight