Python 使用BeautifulSoup使用下拉列表删除表内容_Python_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup使用下拉列表删除表内容

python web-scraping

Python 使用BeautifulSoup使用下拉列表删除表内容,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我想从所有位置的所有球员那里刮取所有搜索结果。我已通过以下代码获得所有RB播放器： from bs4 import BeautifulSoup import requests html_text = requests.get('https://www.cbssports.com/nfl/playersearch?POSITION=RB&print_rows=9999') html = html_text.text soup = BeautifulSoup(html, 'html.p

我想从所有位置的所有球员那里刮取所有搜索结果。
我已通过以下代码获得所有RB播放器：

from bs4 import BeautifulSoup import requests html_text = requests.get('https://www.cbssports.com/nfl/playersearch?POSITION=RB&print_rows=9999') html = html_text.text soup = BeautifulSoup(html, 'html.parser') player_table = soup.find('table', class_='data') for tr in all_player_table.find_all('tr', class_=['row1','row2']): tds = tr.find_all('td') print(("Player:%s , Position:%s , Team: %s") % (tds[0].text, tds[1].text, tds[2].text))

我现在面临的是从下拉列表中删除其他位置的球员。最好的方法是什么？
这个想法非常简单：你可以刮取所有位置，修改URL并搜索所有玩家。代码：

从bs4导入美化组导入请求主url=”https://www.cbssports.com/nfl/playersearch" soup=BeautifulSoup（requests.get（main_url）.text，“html.parser”） #刮除所有位置 positions=[o[“value”]表示汤中的o.find（“选择”，{'name'：“POSITION”}）。find_all（“选项”）] 对于职位中的职位： url=f“{main_url}？POSITION={POSITION}&print_rows=9999” #找到所有玩家 soup=BeautifulSoup（requests.get（url.text，“html.parser”）对于汤中的tr.find（“table”，class=“data”）.find_all（“tr”，class=“row1”，“row2”）： tds=tr.find_all（'td'））打印（（（“玩家：%s，位置：%s，团队：%s”）%（tds[0]。文本，tds[1]。文本，tds[2]。文本））
太棒了！我有一个后续问题（我可能应该单独提问），每个玩家都有一个后续页面（tds[0]），列出了玩家的信息，如身高/体重、年龄、家乡等。是否可以删除这些信息？@Meruemu当然，你可以通过
tds[0]提取玩家url。a[“href”]
，创建一个新的soup对象并刮取新网站。所有玩家信息都在一个带有css类的
div
-标签中
featureComponent stdPad mBottom10
。如果您需要帮助，请随时提出单独的问题。非常感谢。我先自己试试。