Pandas 基于数据重复性生成唯一ID
我有一个这样的数据帧Pandas 基于数据重复性生成唯一ID,pandas,python-2.7,Pandas,Python 2.7,我有一个这样的数据帧 ID,SUBJECT_CODE,SUBJECT_GROUP,CLASS_ID,CAMPUS_ID 1,g1,VP2K,c1,r1 2,g1,VP2K,c1,r1 3,g1,VP3K,c2,r2 4,g1,VP3K,c2,r2 5,g1,VP3K,c3,r3 我必须维护一个列CORR\u ID,其值为所有唯一行的唯一UUID(UUID.uuid4().int),以及重复行的相同UUID。如果一行具有相同的CLASS\u ID和CAMPUS\u ID(subset=['CLA
ID,SUBJECT_CODE,SUBJECT_GROUP,CLASS_ID,CAMPUS_ID
1,g1,VP2K,c1,r1
2,g1,VP2K,c1,r1
3,g1,VP3K,c2,r2
4,g1,VP3K,c2,r2
5,g1,VP3K,c3,r3
我必须维护一个列CORR\u ID
,其值为所有唯一行的唯一UUID(UUID.uuid4().int
),以及重复行的相同UUID。如果一行具有相同的CLASS\u ID
和CAMPUS\u ID
(subset=['CLASS\u ID','CAMPUS\u ID']
),则认为该行是重复的
预期结果
ID,SUBJECT_CODE,SUBJECT_GROUP,CLASS_ID,CAMPUS_ID,CORR_ID
1,g1,VP2K,c1,r1,142313746482664936587190810281013480411 //notice that the uuid of both 1st and 3rd rows are same, as both have same ['CLASS_ID','CAMPUS_ID']. Similarly for the 2nd and 4th rows.
2,g1,VP3K,c2,r2,342313743483664636887990810281013450392
3,g1,VP2K,c1,r1,142313746482664936587190810281013480411
4,g1,VP3K,c2,r2,342313743483664636887990810281013450392
5,g1,VP3K,c3,r3,247313743481654636887998810278015678903
所以,我想知道是否有一种类似于蟒蛇的方式来做这件事。谢谢你的帮助。谢谢。对于我来说,将大整数保存到熊猫列是个问题,因为
溢出错误。可能的解决方案是将值转换为十进制:
from decimal import Decimal
f = lambda x: Decimal(uuid.uuid4().int)
df['CORR_ID'] = df.groupby(['CLASS_ID','CAMPUS_ID'])['CLASS_ID'].transform(f)
print (df)
ID SUBJECT_CODE SUBJECT_GROUP CLASS_ID CAMPUS_ID \
0 1 g1 VP2K c1 r1
1 2 g1 VP2K c1 r1
2 3 g1 VP3K c2 r2
3 4 g1 VP3K c2 r2
4 5 g1 VP3K c3 r3
CORR_ID
0 169638083186337734039542386251361973037
1 169638083186337734039542386251361973037
2 279310814212899708123352457215494669311
3 279310814212899708123352457215494669311
4 187655807105121612884740725825459107251
对我来说,将大整数保存到pandas列是个问题,因为overflowerrror
error。可能的解决方案是将值转换为十进制:
from decimal import Decimal
f = lambda x: Decimal(uuid.uuid4().int)
df['CORR_ID'] = df.groupby(['CLASS_ID','CAMPUS_ID'])['CLASS_ID'].transform(f)
print (df)
ID SUBJECT_CODE SUBJECT_GROUP CLASS_ID CAMPUS_ID \
0 1 g1 VP2K c1 r1
1 2 g1 VP2K c1 r1
2 3 g1 VP3K c2 r2
3 4 g1 VP3K c2 r2
4 5 g1 VP3K c3 r3
CORR_ID
0 169638083186337734039542386251361973037
1 169638083186337734039542386251361973037
2 279310814212899708123352457215494669311
3 279310814212899708123352457215494669311
4 187655807105121612884740725825459107251