Python Web刮取未正确回拉标题_Python

Python Web刮取未正确回拉标题

python

Python Web刮取未正确回拉标题,python,Python,我试图从网上的源代码中只提取标题。我的代码目前能够提取所有正确的行，但我不知道如何使它只提取标题 from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package import requests URL = 'https://sc2replaystats.com/replay/playerStats/10774659/8465' content = requests.get(URL) soup = BeautifulSoup(

我试图从网上的源代码中只提取标题。我的代码目前能够提取所有正确的行，但我不知道如何使它只提取标题

from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URL = 'https://sc2replaystats.com/replay/playerStats/10774659/8465' 
content = requests.get(URL)

soup = BeautifulSoup(content.text, 'html.parser')
tb = soup.find('table', class_='table table-striped table-condensed')
for link in tb.find_all('tr'):
    name = link.find('td')
    print(name.get_text('title'))

我希望它只会说

Nexus
Pylon
Gateway
Assimilator
ect

但我得到了一个错误：

Traceback (most recent call last):
  File "main.py", line 11, in <module>
    print(name.get_text().strip())
AttributeError: 'NoneType' object has no attribute 'get_text'

我不明白我做错了什么，因为从我读到的内容来看，它只会收回期望的结果

尝试下面的代码。您的第一行具有表头而不是表数据，因此在查找td标记时，它将是无的

因此，添加条件以检查何时可以在td标记中找到td或span，然后获取其标题，如下所示

从bs4导入BeautifulSoup BeautifulSoup在bs4包中导入请求 URL='1〕https://sc2replaystats.com/replay/playerStats/10774659/8465' content=requests.getURL soup=BeautifulSoupcontent.text，“html.parser” tb=汤。查找“表”，class='table striped table condensed' 对于tb.find_all'tr'中的链接： name=link.find'span' 如果名称不是None：仅当元素可用时处理打印名['title']

试试下面的代码。您的第一行具有表头而不是表数据，因此在查找td标记时，它将是无的

因此，添加条件以检查何时可以在td标记中找到td或span，然后获取其标题，如下所示

我想你应该用像

对于tb.find_all'tr'中的链接：名称=链接。选择'td[标题]' 打印名。获取文本“标题”

因为在我看到之前，字符串是空的，因为没有title标记name，所以您试图从title attr从标记td获取文本

对于tb.find_all'tr'中的链接：名称=链接。选择'td[标题]' 打印名。获取文本“标题”

因为在我看到之前，字符串是空的，因为没有title标记name，所以您试图从title attr中获取文本，从标记td

bkyada的答案是完美的，如果您想要另一个解决方案的话

在for循环中，不是查找td，而是查找所有span并遍历它，然后查找它的title属性

containers = link.find('span')
if  containers is not None:
  print(containers['title'])

如果你想要另一个解决方案，bkyada的答案是完美的

在for循环中，不是查找td，而是查找所有span并遍历它，然后查找它的title属性

containers = link.find('span')
if  containers is not None:
  print(containers['title'])

简单地使用类名来标识具有title属性的元素更有效，因为它们在第一列中都有一个

from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URL = 'https://sc2replaystats.com/replay/playerStats/10774659/8465' 
content = requests.get(URL)

soup = BeautifulSoup(content.text, 'html.parser')
tb = soup.find('table', class_='table table-striped table-condensed')
titles = [i['title'] for i in tb.select('.blizzard_icons_single')]
print(titles)
titles = {i['title'] for i in tb.select('.blizzard_icons_single')}  #set of unique
print(titles)

由于title属性仅限于该列，您也可以使用slighlty-less-quick属性选择器：

titles = {i['title'] for i in tb.select('[title]')}  #set of unique

简单地使用类名来标识具有title属性的元素更有效，因为它们在第一列中都有一个

from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URL = 'https://sc2replaystats.com/replay/playerStats/10774659/8465' 
content = requests.get(URL)

soup = BeautifulSoup(content.text, 'html.parser')
tb = soup.find('table', class_='table table-striped table-condensed')
titles = [i['title'] for i in tb.select('.blizzard_icons_single')]
print(titles)
titles = {i['title'] for i in tb.select('.blizzard_icons_single')}  #set of unique
print(titles)

由于title属性仅限于该列，您也可以使用slighlty-less-quick属性选择器：

titles = {i['title'] for i in tb.select('[title]')}  #set of unique

你有什么结果吗？或者它会马上出错吗？我没有得到任何结果上面列出的错误是唯一的响应你得到任何结果了吗？或者它会马上出错？我没有得到任何结果上面列出的错误是唯一的响应这就是我在上面的评论中要得到的。在尝试访问可能不存在的元素时，必须小心。因此，无论是通过上面使用的@bkyada的if语句，还是通过try/catch块，您都应该确保在错误方面覆盖您的基础。这就是我将在上面的评论中提到的内容。在尝试访问可能不存在的元素时，必须小心。因此，无论是通过上面使用的@bkyada的if语句，还是通过try/catch块，您都应该确保在错误方面覆盖您的基础。