Python Groupby并为组成员分配唯一ID_Python_Pandas_Pandas Groupby

Python Groupby并为组成员分配唯一ID

python pandas

Python Groupby并为组成员分配唯一ID,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一些数据帧： df = pd.DataFrame({'fruit': ['apple', 'apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange', 'orange', 'orange'], 'distance': [10, 0, 20, 40, 20, 50 ,70, 90, 110, 130]}) df fruit distance 0 apple 10 1

我有一些数据帧：

df = pd.DataFrame({'fruit': ['apple', 'apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange', 'orange', 'orange'], 
                   'distance': [10, 0, 20, 40, 20, 50 ,70, 90, 110, 130]})
df

fruit   distance
0   apple   10
1   apple   0
2   apple   20
3   apple   40
4   orange  20
5   orange  50
6   orange  70
7   orange  90
8   orange  110
9   orange  130

我想为每个按距离排序的组成员添加一个唯一的ID，如下所示：

    fruit   distance    ID
0   apple   10  apple_2
1   apple   0   apple_1
2   apple   20  apple_3
3   apple   40  apple_4
4   orange  20  orange_1
5   orange  50  orange_2
6   orange  70  orange_3
7   orange  130 orange_6
8   orange  110 orange_5
9   orange  90  orange_4

我的排序/分组/循环工作尚未成功

使用：

输出：

    fruit  distance        ID
0   apple        10   apple_2
1   apple         0   apple_1
2   apple        20   apple_3
3   apple        40   apple_4
4  orange        20  orange_1
5  orange        50  orange_2
6  orange        70  orange_3
7  orange        90  orange_4
8  orange       110  orange_5
9  orange       130  orange_6

IIUC

sort

后接

groupby

和

cumsum

和字符串串联

最后我不确定你是哪一类人但这应该行得通

nums = (df.sort_values(["fruit", "distance"]).groupby(["fruit"]).cumcount() + 1).astype(str)

df['ID'] = df['fruit'] + '_' + nums
print(df)
        fruit  distance    ID
0   apple        10   apple_2
1   apple         0   apple_1
2   apple        20   apple_3
3   apple        40   apple_4
4  orange        20  orange_1
5  orange        50  orange_2
6  orange        70  orange_3
7  orange        90  orange_4
8  orange       110  orange_5
9  orange       130  orange_6

我和你很接近<代码>df['id']=（df['fruit'].str.cat（df.groupby（'fruit'）['distance'].rank（）.astype（int）.astype（str），“''））+1很好的解决方案！值得注意的是，对于距离相同的项目，默认的排序方法（“平均”）不能确保ID的唯一性（或连续性），而“第一”方法则更安全。没错，

rank（method='first'）

更安全，可以确保所有ID都是唯一的

nums = (df.sort_values(["fruit", "distance"]).groupby(["fruit"]).cumcount() + 1).astype(str)

df['ID'] = df['fruit'] + '_' + nums
print(df)
        fruit  distance    ID
0   apple        10   apple_2
1   apple         0   apple_1
2   apple        20   apple_3
3   apple        40   apple_4
4  orange        20  orange_1
5  orange        50  orange_2
6  orange        70  orange_3
7  orange        90  orange_4
8  orange       110  orange_5
9  orange       130  orange_6