Python 根据组属性对数据帧进行切片_Python_Pandas_Dataframe

Python 根据组属性对数据帧进行切片

python pandas dataframe

Python 根据组属性对数据帧进行切片,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个Pandas数据框，假设有两列，Group和R（这里是负数）。例如： df = pd.DataFrame({'Group':np.random.randint(0, 5, 20), 'R' :np.random.rand(20) * -10.0}) 我想创建一个新的数据帧，在每个组中（对于具有相同组）的行，只有最小的R，以及小于（比如）最小的R+3的行例如，如果df看起来像（为了清晰起见，我先按Group排序，然后按R排序）函数应该返回

我有一个Pandas数据框，假设有两列，

Group

和

（这里是负数）。例如：

df = pd.DataFrame({'Group':np.random.randint(0, 5, 20), 
                   'R'    :np.random.rand(20) * -10.0})

我想创建一个新的数据帧，在每个组中（对于具有相同

组

）的行，只有最小的

，以及小于（比如）最小的

+3的行

例如，如果

df

看起来像（为了清晰起见，我先按

Group

排序，然后按

排序）

函数应该返回

Group    R  
1       -10.1
1       -12.3
2       -8.7
2       -9.0
2       -11.4
2       -11.5

你是怎么做到的

使用

groupby

df['Max']=df.groupby('Group')['R'].transform('max')
df[(df['Max']-df['R'])<3].drop('Max',1)

Out[105]: 
   Group     R
0      1 -10.1
1      1 -12.3
3      2  -8.7
4      2  -9.0
5      2 -11.4
6      2 -11.5

df['Max']=df.groupby（'Group'）['R'].transform（'Max'））
df[（df['Max']-df['R']）我将首先按“group”分组，并返回一个布尔值，无论该组中的每个值是否比R多3个。然后使用该值过滤原始数据帧
keep = df.groupby('Group')['R'].apply(lambda x: x < x.min() + 3)
keep
0     True
1     True
2    False
3     True
4     True
5     True
6     True
7    False
8    False
....

df[keep].sort_values(['Group', 'R'], ascending=[True, False])

   Group     R
0      1 -10.1
1      1 -12.3
3      2  -8.7
4      2  -9.0
5      2 -11.4
6      2 -11.5

keep=df.groupby（'Group'）['R'].apply（lambda x:x
首先排序，然后按boolen掩码选择：
df = df.sort_values(['Group', 'R'], ascending=[True, False])
df = df[df.groupby('Group')['R'].apply(lambda x: x > x.iat[0] - 3)]
print (df)
   Group     R
0      1 -10.1
1      1 -12.3
3      2  -8.7
4      2  -9.0
5      2 -11.4
6      2 -11.5

类似的解决方案：
df[(df.groupby('Group')['R'].transform('max')-df['R'])<3]

df = df.groupby('Group')['R'].apply(lambda x: x[x > x.iat[0] - 3]).reset_index(level=0)
print (df)

   Group     R
0      1 -10.1
1      1 -12.3
3      2  -8.7
4      2  -9.0
5      2 -11.4
6      2 -11.5

你是说最小的还是最大的加上3？最小的（负数）加3。如果最小的是-10，那么我想选择-10和-7之间的R
。很好，干杯，伙计！不知道iat，但它似乎与iloc类似。是的，它是针对标量的iloc-iloc可以返回多行，iat只返回标量。谢谢，我理解。：）不幸的是，它不适用于我…它保留了不应该保留的值（并且我松开了组，这是我需要的）。有趣的是，对我来说它是有效的。你真正的解决方案有点不同？我刚刚用随机模拟示例对它进行了测试。也许我应该提到我正在使用Python 2.7…这是为什么？太好了。我现在明白了为什么我感到困惑：我对groupby.apply的工作原理有一个误解。现在更清楚了，谢谢！）谢谢你，巴德。你的解决方案也很有趣，它教会了我很多东西！：）
df = df.groupby('Group')['R'].apply(lambda x: x[x > x.iat[0] - 3]).reset_index(level=0)
print (df)

   Group     R
0      1 -10.1
1      1 -12.3
3      2  -8.7
4      2  -9.0
5      2 -11.4
6      2 -11.5