Python Pandas：基于函数高效地更新列值_Python_Pandas_Dataframe

Python Pandas：基于函数高效地更新列值

python pandas dataframe

Python Pandas：基于函数高效地更新列值,python,pandas,dataframe,Python,Pandas,Dataframe,我将用一个小例子来总结我正在尝试做的事情。假设我们有一个数据框，其中有两列（大约15列）如下所示： change period 0 -1 1 1 -1 1 2 0.0 1 3 -1 1 4 1 2 5 1 2 6 0.0 2 7 0.0 2 8 1 2 9 -1 3 ... ... 这扩展了大约25

我将用一个小例子来总结我正在尝试做的事情。假设我们有一个数据框，其中有两列（大约15列）如下所示：


    change  period 
0    -1       1
1    -1       1
2    0.0      1
3    -1       1
4     1       2
5     1       2
6    0.0      2
7    0.0      2
8     1       2
9    -1       3

...
...

这扩展了大约2500万个数据条目。。本质上，我想更改数据帧中change列下的每个0.0，以获取其周期内的方向值（因此表示方向的-1或+1），不包括周期中的第一个条目

目前，我正在运行以下功能，但由于数据条目太多，我无法让它花费数小时：

def getPeriodDirection（周期）：
val=df.loc[（df['period']==period）&（df['change']！=0.0），'change'].中间值（）
返回值
df['change']=df.apply（lambda行：getPeriodDirection（行['period']），如果行['change']==0.0，否则行['change']，轴=1）

我试过几种方法，比如使用。LOC，但我就是不能准确地知道我需要它。我尝试了以下方法：

directionNoChange=df['change'].isin（范围（0,1））
df.loc[directionNoChange，'change']=getPeriodDirection（df，df['period']）

这个解决方案让我非常接近。当'change'=0.0时，我得到了一个带有原始索引的数据帧，并使用函数中的正确值对其进行了更新。根据该示例，它将生成：

  change
2   -1
6    1
7    1

下一步当然是在索引排列的原始数据帧上替换loc数据帧中的该值。但是由于我对API有点不熟悉，所以我遇到了很多麻烦

非常感谢您的帮助

如果要用组的中值替换

0.0

，可以使用

.mask

将

0.0

值转换为

NaN

，然后用

中值填充
print(df)

   change  period
0    -1.0       1
1    -1.0       1
2     0.0       1
3    -1.0       1
4     1.0       2
5     1.0       2
6     0.0       2
7     0.0       2
8     1.0       2
9    -1.0       3

# mask takes a condition and fills the True values with NaN
print(df.change.mask(cond = df.change == 0))

0   -1.0
1   -1.0
2    NaN
3   -1.0
4    1.0
5    1.0
6    NaN
7    NaN
8    1.0
9   -1.0
Name: change, dtype: float64

# use the other parameter similar to a fillna method
df['change'] = df.change.mask(cond = df.change == 0, other = df.groupby('period').change.transform('median'))

print(df)

   change  period
0    -1.0       1
1    -1.0       1
2    -1.0       1
3    -1.0       1
4     1.0       2
5     1.0       2
6     1.0       2
7     1.0       2
8     1.0       2
9    -1.0       3

你想要什么值而不是零1.不清楚我想要给定时间段内“change”列的平均值/中值。所以第一阶段是-1，第二阶段是1etcthanks！非常好的解决问题的方法，不知道mask（）。快速提问，当您按时段分组并计算“更改”列的中位数时，它如何知道哪个是它的时段？希望您通过使用.transform
功能理解IT知道的问题。状态“Call func on self-producting a DataFrame with transformed value and The axis length as as self.”由于我正在对时段进行分组，因此.transform
知道哪些索引属于哪个组。因此它知道每个周期的中位数放在哪里。非常感谢，这真的帮了我大忙！不客气。如果这满足你的需要，请选择它作为正确的答案并考虑投票。