Python 3.x BeautifulSoup4在尝试刮表时返回空列表
我试图从这个url中提取数据:当我试图访问这些表时,我总是会遇到同样的错误 我的代码如下。我运行这个,然后运行Python 3.x BeautifulSoup4在尝试刮表时返回空列表,python-3.x,pandas,beautifulsoup,Python 3.x,Pandas,Beautifulsoup,我试图从这个url中提取数据:当我试图访问这些表时,我总是会遇到同样的错误 我的代码如下。我运行这个,然后运行hp=HTMLTableParser()和table=hp.parse\u url('https://www.winstonslab.com/players/player.php?id=98“)[0][1]返回错误“索引0超出大小为0的轴0的界限” import requests import pandas as pd from bs4 import BeautifulSoup cla
hp=HTMLTableParser()
和table=hp.parse\u url('https://www.winstonslab.com/players/player.php?id=98“)[0][1]
返回错误“索引0超出大小为0的轴0的界限”
import requests
import pandas as pd
from bs4 import BeautifulSoup
class HTMLTableParser:
def parse_url(self, url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
return [(table['id'],self.parse_html_table(table))\
for table in soup.find_all('table')]
def parse_html_table(self, table):
n_columns = 0
n_rows=0
column_names = []
# Find number of rows and columns
# we also find the column titles if we can
for row in table.find_all('tr'):
# Determine the number of rows in the table
td_tags = row.find_all('td')
if len(td_tags) > 0:
n_rows+=1
if n_columns == 0:
# Set the number of columns for our table
n_columns = len(td_tags)
# Handle column names if we find them
th_tags = row.find_all('th')
if len(th_tags) > 0 and len(column_names) == 0:
for th in th_tags:
column_names.append(th.get_text())
# Safeguard on Column Titles
if len(column_names) > 0 and len(column_names) != n_columns:
raise Exception("Column titles do not match the number of columns")
columns = column_names if len(column_names) > 0 else range(0,n_columns)
df = pd.DataFrame(columns = columns,
index= range(0,n_rows))
row_marker = 0
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
for column in columns:
df.iat[row_marker,column_marker] = column.get_text()
column_marker += 1
if len(columns) > 0:
row_marker += 1
# Convert to float if possible
for col in df:
try:
df[col] = df[col].astype(float)
except ValueError:
pass
return df
如果您需要的数据只是表格,则可以使用
pandas.read\u html()
函数来完成
如果您需要的数据只是表格,您可以使用
pandas.read\u html()
函数来完成
您需要查看函数和HTML结构中有哪些可用参数。你需要看看函数和HTML结构中有哪些可用的参数。玩一玩,看看你是否可以读取丢失的数据。