Python 3.x BeautifulSoup4在尝试刮表时返回空列表_Python 3.x_Pandas_Beautifulsoup

Python 3.x BeautifulSoup4在尝试刮表时返回空列表

python-3.x pandas

Python 3.x BeautifulSoup4在尝试刮表时返回空列表,python-3.x,pandas,beautifulsoup,Python 3.x,Pandas,Beautifulsoup,我试图从这个url中提取数据：当我试图访问这些表时，我总是会遇到同样的错误我的代码如下。我运行这个，然后运行hp=HTMLTableParser（）和table=hp.parse\u url（'https://www.winstonslab.com/players/player.php?id=98“）[0][1]返回错误“索引0超出大小为0的轴0的界限” import requests import pandas as pd from bs4 import BeautifulSoup cla

我试图从这个url中提取数据：当我试图访问这些表时，我总是会遇到同样的错误

我的代码如下。我运行这个，然后运行

hp=HTMLTableParser（）

和

table=hp.parse\u url（'https://www.winstonslab.com/players/player.php?id=98“）[0][1]

返回错误“索引0超出大小为0的轴0的界限”

import requests
import pandas as pd
from bs4 import BeautifulSoup

class HTMLTableParser:

    def parse_url(self, url):
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'lxml')
        return [(table['id'],self.parse_html_table(table))\
                for table in soup.find_all('table')]  

    def parse_html_table(self, table):
        n_columns = 0
        n_rows=0
        column_names = []

        # Find number of rows and columns
        # we also find the column titles if we can
        for row in table.find_all('tr'):

            # Determine the number of rows in the table
            td_tags = row.find_all('td')
            if len(td_tags) > 0:
                n_rows+=1
                if n_columns == 0:
                    # Set the number of columns for our table
                    n_columns = len(td_tags)

                # Handle column names if we find them
                th_tags = row.find_all('th') 
                if len(th_tags) > 0 and len(column_names) == 0:
                    for th in th_tags:
                        column_names.append(th.get_text())

            # Safeguard on Column Titles
            if len(column_names) > 0 and len(column_names) != n_columns:
                raise Exception("Column titles do not match the number of columns")

            columns = column_names if len(column_names) > 0 else range(0,n_columns)
            df = pd.DataFrame(columns = columns,
                          index= range(0,n_rows))
            row_marker = 0
            for row in table.find_all('tr'):
                column_marker = 0
                columns = row.find_all('td')
                for column in columns:
                    df.iat[row_marker,column_marker] = column.get_text()
                    column_marker += 1
                if len(columns) > 0:
                    row_marker += 1

            # Convert to float if possible
            for col in df:
                try:
                    df[col] = df[col].astype(float)
                except ValueError:
                    pass

            return df

如果您需要的数据只是表格，则可以使用

pandas.read\u html（）

函数来完成

如果您需要的数据只是表格，您可以使用

pandas.read\u html（）

函数来完成

您需要查看函数和HTML结构中有哪些可用参数。你需要看看函数和HTML结构中有哪些可用的参数。玩一玩，看看你是否可以读取丢失的数据。