Python 3.x python中的SQL sum（）over（）等效项_Python 3.x_Pandas

Python 3.x python中的SQL sum（）over（）等效项

python-3.x pandas

Python 3.x python中的SQL sum（）over（）等效项,python-3.x,pandas,Python 3.x,Pandas,我有一个数据框，提供了月和年的总销售额。我想添加两个新行，按月份和年份对销售值求和。以下是我拥有的数据的快照： df 上述数据显示了不同年份不同月份记录的销售额以下是我希望获得的数据示例： df 如您所见，新列Tot Sales/Month按月份和年份（2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和20

我有一个数据框，提供了月和年的总销售额。我想添加两个新行，按月份和年份对销售值求和。以下是我拥有的数据的快照：

上述数据显示了不同年份不同月份记录的销售额

以下是我希望获得的数据示例：

如您所见，新列

Tot Sales/Month

按月份和年份（2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月和2017年1月

我知道这段代码在SQL中使用窗口函数很容易获得，但我不知道如何在pandas上实现它

我的尝试如下所示：

df.groupby('Month','Year')['Sales'].sum()
df.groupby('Year')['Sales'].sum()

以上两个代码确实提供了我需要的值，但是如何将这些值作为列存储在同一数据帧中

非常感谢您的帮助

您可以使用transform获得所需的列，如下所示

import pandas as pd
import numpy as np

df = pd.DataFrame([('89825870', '1', '2017'), ('248494100', '1', '2017'), ('216344700', '2', '2017'), ('209009300', '3', '2017'), ('204138200', '3', '2017'), ('12456789', '1', '2018'), ('109876543', '1', '2018')], columns=('Sales', 'Month', 'Year'))
df["Sales"] = df["Sales"].astype(np.int)

df["sales/month"] = df.groupby(["Month", "Year"]).transform("sum")
df["sales/year"] = df.groupby("Year")["Sales"].transform("sum")
df

您可以使用transform获得所需的列，如下所示

import pandas as pd
import numpy as np

df = pd.DataFrame([('89825870', '1', '2017'), ('248494100', '1', '2017'), ('216344700', '2', '2017'), ('209009300', '3', '2017'), ('204138200', '3', '2017'), ('12456789', '1', '2018'), ('109876543', '1', '2018')], columns=('Sales', 'Month', 'Year'))
df["Sales"] = df["Sales"].astype(np.int)

df["sales/month"] = df.groupby(["Month", "Year"]).transform("sum")
df["sales/year"] = df.groupby("Year")["Sales"].transform("sum")
df

以下代码应适用于您的预期结果：

import pandas as pd

df = pd.DataFrame([
[89825870, 1, 2017],
[248494100, 1, 2017],
[216344700, 2, 2017],
[209009300, 3, 2017],
[204138200, 3, 2017],
[12456789, 1, 2018],
[109876543, 1, 2018]],columns=["Sales", "Month", "Year"])

df["Tot Sales/Month"] = df.groupby(["Month", "Year"])["Sales"].transform("sum")
df["Tot Sales/Year"] = df.groupby("Year")["Sales"].transform("sum")

那么结果将是：

>>> df
       Sales  Month  Year  Tot Sales/Month  Tot Sales/Year
0   89825870      1  2017        338319970       967812170
1  248494100      1  2017        338319970       967812170
2  216344700      2  2017        216344700       967812170
3  209009300      3  2017        413147500       967812170
4  204138200      3  2017        413147500       967812170
5   12456789      1  2018        122333332       122333332
6  109876543      1  2018        122333332       122333332

以下代码应适用于您的预期结果：

import pandas as pd

df = pd.DataFrame([
[89825870, 1, 2017],
[248494100, 1, 2017],
[216344700, 2, 2017],
[209009300, 3, 2017],
[204138200, 3, 2017],
[12456789, 1, 2018],
[109876543, 1, 2018]],columns=["Sales", "Month", "Year"])

df["Tot Sales/Month"] = df.groupby(["Month", "Year"])["Sales"].transform("sum")
df["Tot Sales/Year"] = df.groupby("Year")["Sales"].transform("sum")

那么结果将是：

>>> df
       Sales  Month  Year  Tot Sales/Month  Tot Sales/Year
0   89825870      1  2017        338319970       967812170
1  248494100      1  2017        338319970       967812170
2  216344700      2  2017        216344700       967812170
3  209009300      3  2017        413147500       967812170
4  204138200      3  2017        413147500       967812170
5   12456789      1  2018        122333332       122333332
6  109876543      1  2018        122333332       122333332

df['Tot Sales/Year']=df.groupby（'Year'）['Sales'].tranform（'sum'）

。这里的技巧是使用

transform

，它返回与原始数据相同的长度因子，因此不会进行聚合。例如：

df['Tot Sales/Month']=df.groupby（['Month'，'Year']）['Sales'].transform（'sum'）

Btw，有一个广泛的部分，其中给出了与SQL的比较示例

df['Tot Sales/Year']=df.groupby（'Year'）['Sales'].transform（'sum'））

。这里的诀窍是使用

转换

，它返回与原始数据相同的长度因子，因此不会进行聚合。例如：

df['Tot Sales/Month']=df.groupby（['Month'，'Year']）['Sales'].transform（'sum'）

Btw，有一个广泛的部分，其中给出了与SQLI比较的示例，我认为第一句话不起作用。结果是下面的错误：“ValueError:传递的项目数错误3，placement暗示1”这对我来说很有效，您可以单独尝试这个完整的示例吗？好的，现在您放置了定义数据帧的代码，您的整个代码都工作了。不确定为什么创建“sales/month”列的代码不适用于我在回答中定义的数据帧。我还在想这是否与你如何设置数据框有关。我相信第一句话不管用。结果是下面的错误：“ValueError:传递的项目数错误3，placement暗示1”这对我来说很有效，您可以单独尝试这个完整的示例吗？好的，现在您放置了定义数据帧的代码，您的整个代码都工作了。不确定为什么创建“sales/month”列的代码不适用于我在回答中定义的数据帧。我还在想它是否与你如何设置数据帧有关。