Python 是否有一种方法可以对pandas中数据帧上的组中的重复项进行编号?
我有一个带有区域、客户端和一些交付的数据框架。 此列用作采购类型,第一次和最后一次采购标记为“第一次”和“最后一次”,但任何中间交付标记为“交付” 是否有一种方法可以转换交付并获得“delivery1”、“delivery2”等标签 期望输出:Python 是否有一种方法可以对pandas中数据帧上的组中的重复项进行编号?,python,pandas,dataframe,object,categorical-data,Python,Pandas,Dataframe,Object,Categorical Data,我有一个带有区域、客户端和一些交付的数据框架。 此列用作采购类型,第一次和最后一次采购标记为“第一次”和“最后一次”,但任何中间交付标记为“交付” 是否有一种方法可以转换交付并获得“delivery1”、“delivery2”等标签 期望输出: data2 = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY1', 20], ['NY', 'A','DELIVERY2', 30], ['NY', 'A','LAST', 25], ['NY
data2 = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY1', 20], ['NY', 'A','DELIVERY2', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY1', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY1', 10], ['NY', 'A','DELIVERY2', 12], ['NY', 'A','DELIVERY3', 25], ['NY', 'A','LAST', 20]
]
df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'purchaseType', 'price'])
print(df2)
提前谢谢 我们可以试试和
您可以使用
np.where
来决定在何处添加数字后缀:
df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
lambda x: np.where(x=='DELIVERY', x+np.arange(len(x)).astype(str), x)
)
print(df)
印刷品:
Region Client purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY1 20
2 NY A DELIVERY2 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY1 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY1 10
9 NY A DELIVERY2 12
10 NY A DELIVERY3 25
11 NY A LAST 20
df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
lambda x: np.where(x=='DELIVERY', x+np.arange(len(x)).astype(str), x)
)
print(df)
Region Client purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY1 20
2 NY A DELIVERY2 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY1 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY1 10
9 NY A DELIVERY2 12
10 NY A DELIVERY3 25
11 NY A LAST 20