Python pd.groupby（）.first（）与pd.groupby（）.min（）之间的区别是什么？_Python_Pandas

Python pd.groupby（）.first（）与pd.groupby（）.min（）之间的区别是什么？

python pandas

Python pd.groupby（）.first（）与pd.groupby（）.min（）之间的区别是什么？,python,pandas,Python,Pandas,伙计们，我有一个Dataframe df= pd.DataFrame({'Point_ID':[1,2,3,1,2,1] , 'Shape_ID': [84,85,86,87,88,89],'LOL':[0,1,0,1,np.nan,np.nan]}) Out[1116]: LOL Point_ID Shape_ID 0 0.0 1 84 1 1.0 2 85 2 0.0 3 86 3

伙计们，我有一个

Dataframe

df= pd.DataFrame({'Point_ID':[1,2,3,1,2,1] , 'Shape_ID': [84,85,86,87,88,89],'LOL':[0,1,0,1,np.nan,np.nan]})

Out[1116]: 
   LOL  Point_ID  Shape_ID
0  0.0         1        84
1  1.0         2        85
2  0.0         3        86
3  1.0         1        87
4  NaN         2        88
5  NaN         1        89

当我这样做的时候：

df.groupby('Point_ID').last()
Out[1114]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

在

Shape\u ID

上返回最后一个值，但在

LOL

上是否应返回

NaN

通过使用

max

，我得到的答案与对数据帧排序时使用

last（）

得到的答案相同

df.groupby('Point_ID').max()

Out[1115]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

我正在阅读关于函数

first

和

last

的pandas文件，找不到答案。有人能帮忙吗？非常感谢~~：-）

演示：

让我们洗牌你的DF：

In [339]: df = df.sample(frac=1)

In [340]: df
Out[340]:
   LOL  Point_ID  Shape_ID
4    0         2        88
0    0         1        84
1    0         2        85
3    1         1        87
2    0         3        86
5   -1         1        89

In [341]: df.groupby('Point_ID').min()
Out[341]:
          LOL  Shape_ID
Point_ID
1          -1        84
2           0        85  #  <----
3           0        86

In [342]: df.groupby('Point_ID').first()
Out[342]:
          LOL  Shape_ID
Point_ID
1           0        84
2           0        88  #  <----
3           0        86

[339]中的

df=df.样本（frac=1）
In[340]：df
出[340]：
LOL点\u ID形状\u ID
4    0         2        88
0    0         1        84
1    0         2        85
3    1         1        87
2    0         3        86
5   -1         1        89
[341]中的df.groupby（'Point_ID'）.min（）
出[341]：
LOL形状\u ID
点ID
1          -1        84
2 0 85#它只是返回与点Id的最后一个值相对应的所有值
考虑一下我在示例中添加了一行的df
    LOL Point_ID    Shape_ID
0   0   1           84
1   0   2           85
2   0   3           86
3   1   1           87
4   0   2           88
5   -1  1           89
6   1   2           25

如果你是groupby
df.groupby('Point_ID').last()

你得到
        LOL Shape_ID
Point_ID        
1       2   25
2       0   88
3       0   86

在这里，LOL中的值恰好是max，但不是max，只是对应于最后一行的LOL值，点为_id 1
请务必在同一时间讨论这个问题，它说目前跳过NaN是first/last的一个特性。如果您不希望出现这种行为，请将n与dropna=False一起使用
df.groupby('Point_ID').nth(-1,dropna=False)

        LOL Shape_ID
Point_ID        
1       NaN 89
2       NaN 88
3       0.0 86

我不确定是否理解您的问题，但first（）
和last（）
返回组中的第一个和最后一个元素。这很简单。如果1
是您的键，则最后一个LOL
是-1
。它不返回LOL上的最小值，只返回最后一个点的对应值_ID@Wen，您可能需要检查此项，然后