Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/github/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 有没有更好的方法来读取许多html url?_Python_Pandas_Dataframe - Fatal编程技术网

Python 有没有更好的方法来读取许多html url?

Python 有没有更好的方法来读取许多html url?,python,pandas,dataframe,Python,Pandas,Dataframe,我需要使用一堆url和Sett.值执行一系列命令 如何使下面的代码更干净 import pandas as pd url_ita_a = ('https://fbref.com/it/comp/11/calendario/Risultati-e-partite-di-Serie-A') df_ita_a = pd.read_html(url_ita_a)[0] df2_ita_a = df_ita_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti

我需要使用一堆
url
Sett.
值执行一系列命令

如何使下面的代码更干净

import pandas as pd

url_ita_a = ('https://fbref.com/it/comp/11/calendario/Risultati-e-partite-di-Serie-A')
df_ita_a = pd.read_html(url_ita_a)[0]
df2_ita_a = df_ita_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_ita_a = df2_ita_a[(df2_ita_a['Punteggio'] == '0–0') & (df2_ita_a['Sett.'] > 15)]

url_tur_a = ('https://fbref.com/it/comp/26/calendario/Risultati-e-partite-di-Super-Lig')
df_tur_a = pd.read_html(url_tur_a)[0]
df2_tur_a = df_tur_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_tur_a = df2_tur_a[(df2_tur_a['Punteggio'] == '0–0') & (df2_tur_a['Sett.'] > 15)]

...
...
...

url_rom_a = ('https://fbref.com/it/comp/47/calendario/Risultati-e-partite-di-Liga-I')
df_rom_a = pd.read_html(url_rom_a)[0]
df2_rom_a = df_rom_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_rom_a = df2_rom_a[(df2_rom_a['Punteggio'] == '0–0') & (df2_rom_a['Sett.'] > 13)]

url_ind_a = ('https://fbref.com/it/comp/82/calendario/Risultati-e-partite-di-Indian-Super-League')
df_ind_a = pd.read_html(url_ind_a)[0]
df2_ind_a = df_ind_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_ind_a = df2_ind_a[(df2_ind_a['Punteggio'] == '0–0') & (df2_ind_a['Sett.'] > 10)]

frames = [zero_df_ita_a, zero_df_tur_a, ... zero_df_rom_a,
          zero_df_ind_a]

result = pd.concat(frames)
  • 在元组列表中添加所有
    URL
    Sett.
  • 使用
    for
    循环对它们进行迭代
  • 代码如下:

    import pandas as pd
    
    urls = [('https://fbref.com/it/comp/11/calendario/Risultati-e-partite-di-Serie-A', 15),
            ('https://fbref.com/it/comp/26/calendario/Risultati-e-partite-di-Super-Lig', 15),
            ('https://fbref.com/it/comp/12/calendario/Risultati-e-partite-di-La-Liga', 13),
            ('https://fbref.com/it/comp/13/calendario/Risultati-e-partite-di-Ligue-1', 13),
            ('https://fbref.com/it/comp/20/calendario/Risultati-e-partite-di-Bundesliga', 13)
            ] #Add all URLS and 'Sett.' values
    
    frames = []
    
    for url, sett in urls:
        df2 = pd.read_html(url)[0][['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
        frames.append(df2[(df2['Punteggio'] == '0–0') & (df2['Sett.'] > sett)]) #Here will come in handy the 'sett' variable
    
    result = pd.concat(frames)
    
    我没有手动将所有URL添加到列表中,因为它们太多了


    把DF放在字典里,在一个循环中完成。回答得很好。建议将url:中我的元组的
    替换为url的
    ,在url:
    中设置。少了一行。哇,我不知道这是一个想法,谢谢!:)请用
    绿色箭头给出答案
    ,以便接受!:)