Python 基于子集的数据帧算法_Python_Pandas

Python 基于子集的数据帧算法

python pandas

Python 基于子集的数据帧算法,python,pandas,Python,Pandas,数据帧类似于： Country Column1 Product Week Val UK S1 A 2019-36 10 UK S1 A 2019-37 20 UK S1 A 2019-38 30 UK S1 B 2019-36 30 UK S1 B 2019-37 30 UK

数据帧类似于：

Country Column1 Product Week        Val
UK      S1      A       2019-36     10
UK      S1      A       2019-37     20
UK      S1      A       2019-38     30
UK      S1      B       2019-36     30
UK      S1      B       2019-37     30
UK      S1      B       2019-38     30
DE      S1      A       2019-39     100
DE      S1      A       2019-40     100
DE      S1      A       2019-41     100
DE      S1      B       2019-36     10
DE      S1      B       2019-37     15
DE      S1      B       2019-38     10

人们怎么说：如果Product=“B”，则从所有其他列相同的产品“A”中获取VAL（国家/地区、第1列和第1周），并将该VAL的50%添加到当前值

例如，第一个“B”的值为35：

30 + (50%*10)

第二个是40：

30 + (50%*20)

第三个是45：

30 + (50%*30)

在“国家”、“第1栏”和“第2栏”条件下使用

pd.Groupby

B = df[df['Column2']=='B'].groupby(['Country','Column1','Week']).sum()
A = df[df['Column2']=='A'].groupby(['Country','Column1','Week']).sum() 
0.5*A + B

输出仅当每个国家/地区（第1列，每周）的每个选项都有唯一的值时，此选项才有效

玩指数怎么样

假设您的数据位于一个名为

data

DataFrame的

pandas.DataFrame中
data = data.set_index(["Country", "Column1", "Week", "Product"], drop=False)
df1 = data[data.Product == "A"].set_index(["Country", "Column1", "Week"], drop=False)
df2 = data[data.Product == "B"].set_index(["Country", "Column1", "Week"], drop=False)
df2.Val += df1.Val * .5  # so that rows with all else the same would add
df2 = df2.set_index(["Country", "Column1", "Week", "Product"])
data.update(df2)
data["Index"] = range(len(data.Val))
data = data.set_index("Index")
data.index.name = None

我认为这种方法的优点是，它完全满足您的需求，并在适当的地方实现结果。它产生
   Country Column1 Product     Week    Val
0       UK      S1       A  2019-36   10.0
1       UK      S1       A  2019-37   20.0
2       UK      S1       A  2019-38   30.0
3       UK      S1       B  2019-36   35.0
4       UK      S1       B  2019-37   40.0
5       UK      S1       B  2019-38   45.0
6       DE      S1       A  2019-39  100.0
7       DE      S1       A  2019-40  100.0
8       DE      S1       A  2019-41  100.0
9       DE      S1       B  2019-36   10.0
10      DE      S1       B  2019-37   15.0
11      DE      S1       B  2019-38   10.0

Column2
你指的是Product？增加50%是什么意思？什么是“那”？为什么第一个B是35？产品a和产品B之间总是1比1匹配吗？@vealkind是的，我的错误-已经更正了标签。谢谢你-看起来确实符合我的要求。是否有机会保留DE的值，但仍保留Product列？例如，只需更改输入数据框中的数据，而不是创建单独的数据框？
   Country Column1 Product     Week    Val
0       UK      S1       A  2019-36   10.0
1       UK      S1       A  2019-37   20.0
2       UK      S1       A  2019-38   30.0
3       UK      S1       B  2019-36   35.0
4       UK      S1       B  2019-37   40.0
5       UK      S1       B  2019-38   45.0
6       DE      S1       A  2019-39  100.0
7       DE      S1       A  2019-40  100.0
8       DE      S1       A  2019-41  100.0
9       DE      S1       B  2019-36   10.0
10      DE      S1       B  2019-37   15.0
11      DE      S1       B  2019-38   10.0