Python 试图将ID分配给数据帧中的对,结果不一致
我有一个df:Python 试图将ID分配给数据帧中的对,结果不一致,python,pandas,dataframe,lambda,non-deterministic,Python,Pandas,Dataframe,Lambda,Non Deterministic,我有一个df: df = pd.DataFrame({'src':['LV','LA','NC','NY','ABC','XYZ'], 'dest':['NC','NY','LV','LA','XYZ','ABC'], 'dummy':[1,3,6,7,8,10]}) src dest dummy LV NC 1 LA NY 3 NC LV 6 NY LA 7 ABC XYZ 8 X
df = pd.DataFrame({'src':['LV','LA','NC','NY','ABC','XYZ'], 'dest':['NC','NY','LV','LA','XYZ','ABC'], 'dummy':[1,3,6,7,8,10]})
src dest dummy
LV NC 1
LA NY 3
NC LV 6
NY LA 7
ABC XYZ 8
XYZ ABC 10
我通过以下程序运行它:
df['pair'] = df[['src', 'dest']].apply(lambda x : tuple(set(x)), 1).factorize()[0] + 1
尝试并键入唯一对,如(a->b,b->a)
我正确地得出以下结论:
src dest dummy pair
LV NC 1 1
LA NY 3 2
NC LV 6 1
NY LA 7 2
ABC XYZ 8 3
XYZ ABC 10 3
src dest dummy pair
LV NC 1 1
LA NY 3 2
NC LV 6 1
NY LA 7 2
ABC XYZ 8 3
XYZ ABC 10 4
但是,有时我运行它时,会出现以下错误:
src dest dummy pair
LV NC 1 1
LA NY 3 2
NC LV 6 1
NY LA 7 2
ABC XYZ 8 3
XYZ ABC 10 3
src dest dummy pair
LV NC 1 1
LA NY 3 2
NC LV 6 1
NY LA 7 2
ABC XYZ 8 3
XYZ ABC 10 4
正如您所看到的,由于某些原因,最后一个元素没有正确地键入对“3”。这是随机发生的。我可以通过注释掉“结对”代码,运行脚本生成并打印df,然后取消注释并重试来重现这一点。通过使用其他修改运行,您可以以其他方式重现此功能
如何修复此非确定性行为?试试这是
set
的问题,您可以将其更改为frozenset
df['pair'] = pd.DataFrame(np.sort(df[['src','dest']].values,1)).agg(tuple,1).factorize()[0]+1
Out[108]: array([1, 2, 1, 2, 3, 3], dtype=int64)
谢谢上面的代码中到底在哪里使用了frozenset?@reeeeee我是说修复了您的代码
df['pair']=df['src','dest']].apply(lambda x:tuple(frozenset(x)),1.factorize()[0]+1