Python 有没有更好的方法来读取许多html url?
我需要使用一堆Python 有没有更好的方法来读取许多html url?,python,pandas,dataframe,Python,Pandas,Dataframe,我需要使用一堆url和Sett.值执行一系列命令 如何使下面的代码更干净 import pandas as pd url_ita_a = ('https://fbref.com/it/comp/11/calendario/Risultati-e-partite-di-Serie-A') df_ita_a = pd.read_html(url_ita_a)[0] df2_ita_a = df_ita_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti
url
和Sett.
值执行一系列命令
如何使下面的代码更干净
import pandas as pd
url_ita_a = ('https://fbref.com/it/comp/11/calendario/Risultati-e-partite-di-Serie-A')
df_ita_a = pd.read_html(url_ita_a)[0]
df2_ita_a = df_ita_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_ita_a = df2_ita_a[(df2_ita_a['Punteggio'] == '0–0') & (df2_ita_a['Sett.'] > 15)]
url_tur_a = ('https://fbref.com/it/comp/26/calendario/Risultati-e-partite-di-Super-Lig')
df_tur_a = pd.read_html(url_tur_a)[0]
df2_tur_a = df_tur_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_tur_a = df2_tur_a[(df2_tur_a['Punteggio'] == '0–0') & (df2_tur_a['Sett.'] > 15)]
...
...
...
url_rom_a = ('https://fbref.com/it/comp/47/calendario/Risultati-e-partite-di-Liga-I')
df_rom_a = pd.read_html(url_rom_a)[0]
df2_rom_a = df_rom_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_rom_a = df2_rom_a[(df2_rom_a['Punteggio'] == '0–0') & (df2_rom_a['Sett.'] > 13)]
url_ind_a = ('https://fbref.com/it/comp/82/calendario/Risultati-e-partite-di-Indian-Super-League')
df_ind_a = pd.read_html(url_ind_a)[0]
df2_ind_a = df_ind_a[['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
zero_df_ind_a = df2_ind_a[(df2_ind_a['Punteggio'] == '0–0') & (df2_ind_a['Sett.'] > 10)]
frames = [zero_df_ita_a, zero_df_tur_a, ... zero_df_rom_a,
zero_df_ind_a]
result = pd.concat(frames)
URL
和Sett.
值for
循环对它们进行迭代import pandas as pd
urls = [('https://fbref.com/it/comp/11/calendario/Risultati-e-partite-di-Serie-A', 15),
('https://fbref.com/it/comp/26/calendario/Risultati-e-partite-di-Super-Lig', 15),
('https://fbref.com/it/comp/12/calendario/Risultati-e-partite-di-La-Liga', 13),
('https://fbref.com/it/comp/13/calendario/Risultati-e-partite-di-Ligue-1', 13),
('https://fbref.com/it/comp/20/calendario/Risultati-e-partite-di-Bundesliga', 13)
] #Add all URLS and 'Sett.' values
frames = []
for url, sett in urls:
df2 = pd.read_html(url)[0][['Sett.', 'Data', 'Casa', 'Punteggio', 'Ospiti']]
frames.append(df2[(df2['Punteggio'] == '0–0') & (df2['Sett.'] > sett)]) #Here will come in handy the 'sett' variable
result = pd.concat(frames)
我没有手动将所有URL添加到列表中,因为它们太多了
把DF放在字典里,在一个循环中完成。回答得很好。建议将url:中我的元组的
替换为url的,在url:
中设置。少了一行。哇,我不知道这是一个想法,谢谢!:)请用绿色箭头给出答案
,以便接受!:)