Python 将条件应用于分组数据_Python_Pandas_Dataframe_Conditional Statements_Pandas Groupby

Python 将条件应用于分组数据

python pandas dataframe

Python 将条件应用于分组数据,python,pandas,dataframe,conditional-statements,pandas-groupby,Python,Pandas,Dataframe,Conditional Statements,Pandas Groupby,我以前也问过R类似的问题，但我现在正试图用python复制相同的任务。我在这篇文章中得到的解决方案与我正在寻找的类似基本上，我需要根据分组数据有条件地创建一个新列以下是一些示例数据： import pandas as pd test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": [0,1,2,3,4,5,0,1,2,3,4,5],"index": [1,1.1,1.4,1.5,1.6,1.67,1,1.4

我以前也问过R类似的问题，但我现在正试图用python复制相同的任务。我在这篇文章中得到的解决方案与我正在寻找的类似

基本上，我需要根据分组数据有条件地创建一个新列

以下是一些示例数据：

import pandas as pd

test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": 
[0,1,2,3,4,5,0,1,2,3,4,5],"index": 
[1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95]})

我现在想创建一个新的列，“new_index”，它将等于时间3之前的索引，但从时间3开始将以不同的速度增长，比如10%。所以现在数据看起来像

test2 = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": 
[0,1,2,3,4,5,0,1,2,3,4,5],"index": 
[1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95],"new_index": 
[1,1.1,1.4,1.54,1.694,1.8634,1,1.4,1.5,1.65,1.815,1.9965]})

我试过一些这样的代码，但不起作用

def gr_adj(df):
    if df["time"] <= 2:
        return df["index"]
    else:
        return np.cumprod(df["new_index"])

test["new_index] = test.groupby("Group",group_keys=False).apply(gr_adj)

def gr_adj（df）：
如果df[“time”]这里有一种方法使用cumprod
，首先将时间大于3的所有索引屏蔽为1.1，然后我们通过不包含不需要更新的索引来切片输出，然后我们groupby
获取cumprod
，然后将其分配回
s=test['index'].where(test['time']<3,1.1).loc[test['time']>=2].groupby(test['Group']).cumprod()
test.loc[test['time']>=2,'index']=s
test
Out[290]: 
    Group  time   index
0       1     0  1.0000
1       1     1  1.1000
2       1     2  1.4000
3       1     3  1.5400
4       1     4  1.6940
5       1     5  1.8634
6       2     0  1.0000
7       2     1  1.4000
8       2     2  1.5000
9       2     3  1.6500
10      2     4  1.8150
11      2     5  1.9965

s=test['index']。其中（test['time']=2]。groupby（test['Group']）。cumprod（）
test.loc[test['time']>=2，'index']=s
测试
出[290]：
组时间索引
0       1     0  1.0000
1       1     1  1.1000
2       1     2  1.4000
3       1     3  1.5400
4       1     4  1.6940
5       1     5  1.8634
6       2     0  1.0000
7       2     1  1.4000
8       2     2  1.5000
9       2     3  1.6500
10      2     4  1.8150
11      2     5  1.9965
如果时间>3，这里有另一个答案，实际上会将索引增加10%：
import pandas as pd

test = pd.DataFrame(data={"Group":[1,1,1,1,1,1,2,2,2,2,2,2],"time": [0,1,2,3,4,5,0,1,2,3,4,5],"index": [1,1.1,1.4,1.5,1.6,1.67,1,1.4,1.5,1.6,1.93,1.95]})

def gr_adj(row):
    if row["time"] <= 2:
        return row["index"]
    else:
        return row["index"] + (row["index"] * 0.1)

test["new_index"] = test.apply(gr_adj, axis=1)

这将使用您的行的值作为函数的输入，并将其应用于每一行。如果时间>=2
，它将以索引+10%的速度增长新索引。@d_kennetz是的，我希望新的索引根据之前对自身的观察而增长，因此它独立增长从“索引”中删除时间3之后，时间列中的值是循环的且总是有序的吗？@SMir是的，每个组的时间行数相等，并且它们是有序的。我最近不得不再次使用这段代码，非常感谢！我有一个更一般的知识问题，那就是，如何使用其他Ser获取一个序列并对其进行筛选ies？Series是否在同一数据帧中存储关于其他系列的信息？@Elision，因为索引将在groupby之前首先匹配~
    Group  time  index  new_index
0       1     0   1.00      1.000
1       1     1   1.10      1.100
2       1     2   1.40      1.400
3       1     3   1.50      1.650
4       1     4   1.60      1.760
5       1     5   1.67      1.837
6       2     0   1.00      1.000
7       2     1   1.40      1.400
8       2     2   1.50      1.500
9       2     3   1.60      1.760
10      2     4   1.93      2.123
11      2     5   1.95      2.145