Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 尝试使用循环从网站导入数据时出错_Python_Pandas_Dataframe_Import - Fatal编程技术网

Python 尝试使用循环从网站导入数据时出错

Python 尝试使用循环从网站导入数据时出错,python,pandas,dataframe,import,Python,Pandas,Dataframe,Import,我正在尝试使用Python将多个网页中的数据导入到一个数据表中。 基本上,我从2000年开始就在尝试下载某些团队的出勤数据 以下是我到目前为止的情况: import requests import pandas as pd import numpy as np #What is the effect of a rival team's performance on a team's attendance Teams = ['LAA', 'LAD', 'NYY', 'NYM', 'CHC',

我正在尝试使用Python将多个网页中的数据导入到一个数据表中。 基本上,我从2000年开始就在尝试下载某些团队的出勤数据

以下是我到目前为止的情况:

import requests
import pandas as pd
import numpy as np

#What is the effect of a rival team's performance on a team's attendance

Teams = ['LAA', 'LAD', 'NYY', 'NYM', 'CHC', 'CHW', 'OAK', 'SFG']
Years = []
for year in range(2000,2020):
    Years.append(str(year))

bbattend = pd.DataFrame(columns=['GM_Num','Date','Team','Home','Opp','W/L','R','RA','Inn','W-L','Rank','GB','Time','D/N','Attendance','Streak','Game_Win','Wins','Losses','Net_Wins'])

for team in Teams:
    for year in Years:
        url = 'https://www.baseball-reference.com/teams/' + team + '/' + year +'-schedule-scores.shtml'
        html = requests.get(url).content
        df_list = pd.read_html(html)
        df = df_list[-1]

        #Formatting data table
        df.rename(columns={"Gm#": "GM_Num", "Unnamed: 4": "Home", "Tm": "Team", "D/N": "Night"}, inplace = True)
        df['Home'] = df['Home'].apply(lambda x: 0 if x == '@' else 1)
        df['Game_Win'] = df['W/L'].astype(str).str[0]
        df['Game_Win'] = df['Game_Win'].apply(lambda x: 0 if x == 'L' else 1)
        df['Night'] = df['Night'].apply(lambda x: 1 if x == 'N' else 0)
        df['Streak'] = df['Streak'].apply(lambda x: -1*len(x) if '-' in x else len(x))
        df.drop('Unnamed: 2', axis=1, inplace = True)
        df.drop('Orig. Scheduled', axis=1, inplace = True)
        df.drop('Win', axis=1, inplace = True)
        df.drop('Loss', axis=1, inplace = True)
        df.drop('Save', axis=1, inplace = True)
        #Drop rows that do not have data
        df = df[df['GM_Num'].str.isdigit()]
        WL = df["W-L"].str.split("-", n = 1, expand = True)
        df["Wins"] = WL[0].astype(dtype=np.int64)
        df["Losses"] = WL[1].astype(dtype=np.int64)
        df['Net_Wins'] = df['Wins'] - df['Losses']
        bbattend.append(df)

bbattend
当我在循环中单独使用特定链接而不是试图使用连接来创建url时,它似乎可以工作

但是,使用此代码,我得到了错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-77-997e6aeea77e> in <module>
     16         url = 'https://www.baseball-reference.com/teams/' + team + '/' + year +'-schedule-scores.shtml'
     17         html = requests.get(url).content
---> 18         df_list = pd.read_html(html)
     19         df = df_list[-1]
     20         #Formatting data table

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
   1092                   decimal=decimal, converters=converters, na_values=na_values,
   1093                   keep_default_na=keep_default_na,
-> 1094                   displayed_only=displayed_only)

~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    914             break
    915     else:
--> 916         raise_with_traceback(retained)
    917 
    918     ret = []

~/anaconda3/lib/python3.7/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback)
    418         if traceback == Ellipsis:
    419             _, _, traceback = sys.exc_info()
--> 420         raise exc.with_traceback(traceback)
    421 else:
    422     # this version of raise is a syntax error in Python 3

ValueError: No tables found
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在里面
16网址:https://www.baseball-reference.com/teams/“+团队+”/“+年度+”-计划分数。shtml”
17 html=requests.get(url.content)
--->18 df_list=pd.read_html(html)
19 df=df_列表[-1]
20#格式化数据表
读取html格式的~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py(io、匹配、味道、标题、索引列、skiprows、属性、解析日期、元组列、千、编码、十进制、转换器、na值、保留默认值、仅显示)
1092十进制=十进制,转换器=转换器,na_值=na_值,
1093保留默认值=保留默认值,
->1094仅显示=仅显示)
解析中的~/anaconda3/lib/python3.7/site-packages/pandas/io/html.py(味道、io、匹配、属性、编码、仅显示、**kwargs)
914休息
915其他:
-->916带回溯的raise_(保留)
917
918 ret=[]
~/anaconda3/lib/python3.7/site packages/pandas/compat/\uuuuuu init\uuuuuuuuu.py in raise\u带回溯(exc,回溯)
418如果回溯==省略号:
419 u,u,traceback=sys.exc_info()
-->420带回溯的提升exc(回溯)
421其他:
422#此版本的raise在Python 3中是一个语法错误
ValueError:未找到任何表
我真的不明白错误信息在说什么。
我将感谢任何帮助

因为某些页面中没有任何表格,例如,和

因此,
df\u list=pd.read\u html(html)
将引发
ValueError:No tables found


您需要在此处使用
try except

一个或多个url没有表,您可以尝试使用
try:
except:
url
https://www.baseball-reference.com/teams/LAA/2000-schedule-scores.shtml
返回404,因此该页面上没有表格谢谢!我没有意识到