Pandas 熊猫为线性回归积累数据
我试图调整我的数据,以便累积每天的总金额。例如Pandas 熊猫为线性回归积累数据,pandas,matplotlib,machine-learning,linear-regression,Pandas,Matplotlib,Machine Learning,Linear Regression,我试图调整我的数据,以便累积每天的总金额。例如 `Created` `total_gross` `total_gross_accumulated` Day 1 100 100 Day 2 100 200 Day 3 100 300 Day 4 100 400 你知道我该如何更改代码以使累计总量可用吗 这是我的数据 我的代码: from sklearn import line
`Created` `total_gross` `total_gross_accumulated`
Day 1 100 100
Day 2 100 200
Day 3 100 300
Day 4 100 400
你知道我该如何更改代码以使累计总量可用吗
这是我的数据
我的代码:
from sklearn import linear_model
def load_event_data():
df = pd.read_csv('sample-data.csv', usecols=['created', 'total_gross'])
df['created'] = pd.to_datetime(df.created)
return df.set_index('created').resample('D').sum().fillna(0)
event_data = load_event_data()
X = event_data.index
y = event_data.total_gross
plt.xticks(rotation=90)
plt.plot(X, y)
plt.show()
列表理解是实现这一点的最有效的方法
简短回答:
这将为您提供所需的新列:
n = event_data.shape[0]
# skip line 0 and start by accumulating from 1 until the end
total_gross_accumulated =[event_data['total_gross'][:i].sum() for i in range(1,n+1)]
# add the new variable in the initial pandas dataframe
event_data['total_gross_accumulated'] = total_gross_accumulated
或更快
event_data['total_gross_accumulated'] = event_data['total_gross'].cumsum()
长答案: 使用数据的完整代码:
import pandas as pd
def load_event_data():
df = pd.read_csv('sample-data.csv', usecols=['created', 'total_gross'])
df['created'] = pd.to_datetime(df.created)
return df.set_index('created').resample('D').sum().fillna(0)
event_data = load_event_data()
n = event_data.shape[0]
# skip line 0 and start by accumulating from 1 until the end
total_gross_accumulated =[event_data['total_gross'][:i].sum() for i in range(1,n+1)]
# add the new variable in the initial pandas dataframe
event_data['total_gross_accumulated'] = total_gross_accumulated
结果:
event_data.head(6)
# total_gross total_gross_accumulated
#created
#2019-03-01 3481810 3481810
#2019-03-02 4690 3486500
#2019-03-03 0 3486500
#2019-03-04 0 3486500
#2019-03-05 0 3486500
#2019-03-06 0 3486500
X = event_data.index
y = event_data.total_gross_accumulated
plt.xticks(rotation=90)
plt.plot(X, y)
plt.show()
太好了,这很有效。谢谢你,瑟拉费姆。我刚刚在这里了解了这个命令:
event\u data['cumsum']=event\u data['total\u gross'].cumsum()
,实际上它只使用一行。任何反对这种方法的东西?是的,我已经在我的回答中包括了这一点。两者是等价的。你的方法似乎很好。考虑一下我的回答。干杯