Python pandas:有效地应用将整个数据帧用作输入的函数
我有一个熊猫数据框架,它根据日期对产品购买进行建模。我想添加一些功能,比如昨天、上周等发生了多少次购买。有没有一种优雅而有效的方法可以做到这一点?现在我正在做一个循环,这需要很多时间 鉴于数据:Python pandas:有效地应用将整个数据帧用作输入的函数,python,pandas,time-series,feature-extraction,Python,Pandas,Time Series,Feature Extraction,我有一个熊猫数据框架,它根据日期对产品购买进行建模。我想添加一些功能,比如昨天、上周等发生了多少次购买。有没有一种优雅而有效的方法可以做到这一点?现在我正在做一个循环,这需要很多时间 鉴于数据: import pandas as pd, numpy as np dico = {"dates":["2017-11-20"]*3+["2017-11-21"]*3+ ["2017-11-22"]*3, "product":["A", "B", "C"]*3, "sales": np.arange(1,
import pandas as pd, numpy as np
dico = {"dates":["2017-11-20"]*3+["2017-11-21"]*3+ ["2017-11-22"]*3, "product":["A", "B", "C"]*3, "sales": np.arange(1,10)}
df = pd.DataFrame.from_dict(dico)
df["dates"] = pd.to_datetime(df.dates)
要获取前两天的销售额和前两天的销售额之和,请执行以下操作:
one_day = pd.to_timedelta(1, unit='d')
two_days = pd.to_timedelta(2, unit='d')
yesterday_sales, last_two_days_sales = [], []
for _, row in df.iterrows():
yesterday_performance = df.loc[(df["product"] == row["product"]) & (df.dates == (row["dates"]-one_day)) ]
if yesterday_performance.shape[0] == 1:
yesterday_sales.append(yesterday_performance.sales.values[0])
else:
yesterday_sales.append(-1)
two_days_sales = df.loc[(df["product"] == row["product"]) & (df["dates"] >= (row["dates"]-two_days)) & (df["dates"] < (row["dates"]))]
if two_days_sales.shape[0] >= 1:
last_two_days_sales.append(two_days_sales.sales.sum())
else:
last_two_days_sales.append(-1)
df["yesterday_sales"] = yesterday_sales
df["last_two_days_sales"] = last_two_days_sales
循环中的一切都很耗时,但我想不出更好的方法 我简化了您的代码。它仍然没有矢量化,但如果性能不是问题,则应该更容易维护:
def one_day(row):
yday_perf = df.loc[(df['product'] == row['product']) & (df['dates'] == (row['dates'] + pd.Timedelta(days=-1))), 'sales']
return yday_perf.values[0] if not yday_perf.empty else -1
def two_day(row):
twoday_perf = df.loc[(df['product'] == row['product']) & (df['dates'] >= (row['dates'] + pd.Timedelta(days=-2))) & (df['dates'] < row['dates']), 'sales']
return twoday_perf.sum() if len(twoday_perf) >=1 else -1
df['yesterday_sales'] = df.apply(one_day, axis=1)
df['last_two_days_sales'] = df.apply(two_day, axis=1)
# dates product sales yesterday_sales last_two_days_sales
# 0 2017-11-20 A 1 -1 -1
# 1 2017-11-20 B 2 -1 -1
# 2 2017-11-20 C 3 -1 -1
# 3 2017-11-21 A 4 1 1
# 4 2017-11-21 B 5 2 2
# 5 2017-11-21 C 6 3 3
# 6 2017-11-22 A 7 4 5
# 7 2017-11-22 B 8 5 7
# 8 2017-11-22 C 9 6 9
这很有帮助。你有没有办法让它更快呢?你可以试着把2个apply调用合并成1个,同时分配给2个系列。您的函数必须返回一个元组。