Python 3.x 美化组、请求、数据帧、从中提取并保存到Excel
Python新手又来了!2个问题: 1如何将所有这些数据保存到excel中一张名为“摘要”的表格中,而不是保存到多个选项卡中当前每年保存到以年份命名的选项卡中。 2'div',class_u2;=武器调度游戏结果返回格式W,1-0。如何将W,1-0拆分为两列,一列包含W,下一列包含1-0。 非常感谢Python 3.x 美化组、请求、数据帧、从中提取并保存到Excel,python-3.x,pandas,html-table,beautifulsoup,python-requests,Python 3.x,Pandas,Html Table,Beautifulsoup,Python Requests,Python新手又来了!2个问题: 1如何将所有这些数据保存到excel中一张名为“摘要”的表格中,而不是保存到多个选项卡中当前每年保存到以年份命名的选项卡中。 2'div',class_u2;=武器调度游戏结果返回格式W,1-0。如何将W,1-0拆分为两列,一列包含W,下一列包含1-0。 非常感谢 import requests import pandas as pd from pandas import ExcelWriter from bs4 import
import requests
import pandas as pd
from pandas import ExcelWriter
from bs4 import BeautifulSoup
import openpyxl
import csv
year_id = ['2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018','2019']
lehigh_url = 'https://lehighsports.com/sports/mens-soccer/schedule/'
results = []
with requests.Session() as req:
for year in range(2003, 2020):
print(f"Extracting Year# {year}")
url = req.get(f"{lehigh_url}{year}")
if url.status_code == 200:
soup = BeautifulSoup(url.text, 'lxml')
rows = soup.find_all('div',class_="sidearm-schedule-game-row flex flex-wrap flex-align-center row")
sheet = pd.DataFrame()
for row in rows:
date = row.find('div',class_="sidearm-schedule-game-opponent-date").text.strip()
name = row.find('div',class_="sidearm-schedule-game-opponent-name").text.strip()
opp = row.find('div',class_="sidearm-schedule-game-opponent-text").text.strip()
conf = row.find('div',class_="sidearm-schedule-game-conference-conference").text.strip()
try:
result = row.find('div',class_="sidearm-schedule-game-result").text.strip()
except:
result = ''
df = pd.DataFrame([[year,date,name,opp,conf,result]], columns=['year','date','opponent','list','conference','result'])
sheet = sheet.append(df,sort=True).reset_index(drop=True)
results.append(sheet)
def save_xls(list_dfs, xls_path):
with ExcelWriter(xls_path) as writer:
for n, df in enumerate(list_dfs):
df.to_excel(writer,'%s' %year_id[n],index=False,)
writer.save()
save_xls(results,'lehigh.xlsx')
您可以将每个工作表附加到一个数据帧中,并使用pandas将其写入文件,而不是创建数据帧列表。然后要拆分为两列,只需使用.str.split并在逗号上拆分即可
import requests
import pandas as pd
from bs4 import BeautifulSoup
year_id = ['2019','2018','2017','2016','2015','2014','2013','2012','2011','2010','2009','2008','2007','2006','2005','2004','2003']
results = pd.DataFrame()
for year in year_id:
url = 'https://lehighsports.com/sports/mens-soccer/schedule/' + year
print (url)
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,'lxml')
rows = soup.find_all('div',class_="sidearm-schedule-game-row flex flex-wrap flex-align-center row")
sheet = pd.DataFrame()
for row in rows:
date = row.find('div',class_="sidearm-schedule-game-opponent-date").text.strip()
name = row.find('div',class_="sidearm-schedule-game-opponent-name").text.strip()
opp = row.find('div',class_="sidearm-schedule-game-opponent-text").text.strip()
conf = row.find('div',class_="sidearm-schedule-game-conference-conference").text.strip()
try:
result = row.find('div',class_="sidearm-schedule-game-result").text.strip()
except:
result = ''
df = pd.DataFrame([[year,date,name,opp,conf,result]], columns=['year','date','opponent','list','conference','result'])
sheet = sheet.append(df,sort=True).reset_index(drop=True)
results = results.append(sheet, sort=True).reset_index(drop=True)
results['result'], results['score'] = results['result'].str.split(',', 1).str
results.to_excel('lehigh.xlsx')
关于拆分列列表的教程它们有相同的列吗?这太好了!我希望有一天我能达到你的python编程水平。有什么建议可以让我跟上这样的速度吗?我从datacamp.com和dataquest.io中了解到,在数据科学方面,我在寻找类似python的东西。至于网页抓取,我是通过反复试验才知道这一点的