Python Web刮取并从整个表的td中提取属性值,而不是文本值
我试图从表中提取一些数据,但它们的内容实际上是我想要的属性 xml示例: ''' ''' 在我当前的代码中,我得到了一个外观良好的数据框,其中包含了标题和查看表时可见的所有信息。然而,我想在桌子上用“出去:脑震荡”而不是“O”。我已经尝试了很多方法,但都想不出来。请让我知道当前流程是否可行,或者我是否完全错了。这将有助于您:Python Web刮取并从整个表的td中提取属性值,而不是文本值,python,pandas,web-scraping,beautifulsoup,python-requests,Python,Pandas,Web Scraping,Beautifulsoup,Python Requests,我试图从表中提取一些数据,但它们的内容实际上是我想要的属性 xml示例: ''' ''' 在我当前的代码中,我得到了一个外观良好的数据框,其中包含了标题和查看表时可见的所有信息。然而,我想在桌子上用“出去:脑震荡”而不是“O”。我已经尝试了很多方法,但都想不出来。请让我知道当前流程是否可行,或者我是否完全错了。这将有助于您: import pandas as pd from bs4 import BeautifulSoup import requests url = 'https://www.
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = 'https://www.pro-football-reference.com/teams/atl/2017_injuries.htm'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
table = soup.find('table', attrs={'class': 'sortable', 'id': 'team_injuries'})
table_rows = table.find_all('tr')
final_data = []
for tr in table_rows:
td = tr.find_all(['th','td'])
row = [tr['data-tip'] if tr.has_attr("data-tip") else tr.text for tr in td]
final_data.append(row)
m = final_data[1:]
final_dataa = [[m[j][i] for j in range(len(m))] for i in range(len(m[0]))]
df = pd.DataFrame(final_dataa,final_data[0]).T
df.to_csv("D:\\injuries.csv", index = False)
csv
文件的屏幕截图(我做了一些格式化,使其看起来整洁):
因为它没有给我想要的信息。我正在寻找拉属性,而不是文本出来。这是巨大的!真不敢相信我竟然错过了这么一小步。非常感谢。
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = 'https://www.pro-football-reference.com/teams/atl/2017_injuries.htm'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
table = soup.find('table', attrs={'class': 'sortable', 'id': 'team_injuries'})
table_rows = table.find_all('tr')
final_data = []
for tr in table_rows:
td = tr.find_all(['th','td'])
row = [tr.text for tr in td]
final_data.append(row)
df = pd.DataFrame(final_data[1:],final_data[0])
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = 'https://www.pro-football-reference.com/teams/atl/2017_injuries.htm'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
table = soup.find('table', attrs={'class': 'sortable', 'id': 'team_injuries'})
table_rows = table.find_all('tr')
final_data = []
for tr in table_rows:
td = tr.find_all(['th','td'])
row = [tr['data-tip'] if tr.has_attr("data-tip") else tr.text for tr in td]
final_data.append(row)
m = final_data[1:]
final_dataa = [[m[j][i] for j in range(len(m))] for i in range(len(m[0]))]
df = pd.DataFrame(final_dataa,final_data[0]).T
df.to_csv("D:\\injuries.csv", index = False)