Python 3.x 从中选择一列中值相同但另一列中值不同的行_Python 3.x_Pandas_Duplicates

Python 3.x 从中选择一列中值相同但另一列中值不同的行

python-3.x pandas

Python 3.x 从中选择一列中值相同但另一列中值不同的行,python-3.x,pandas,duplicates,Python 3.x,Pandas,Duplicates,我的数据中有一些重复项需要更正这是数据帧的一个示例： test = pd.DataFrame({'event_id':['1','1','2','3','5','6','9','3','9','10'], 'user_id':[0,0,0,1,1,3,3,4,4,4], 'index':[10,20,30,40,50,60,70,80,90,100]}) 我需要选择在event\u id中具有相同值但在user\u

我的数据中有一些重复项需要更正

这是数据帧的一个示例：

    test = pd.DataFrame({'event_id':['1','1','2','3','5','6','9','3','9','10'],
                 'user_id':[0,0,0,1,1,3,3,4,4,4],
                 'index':[10,20,30,40,50,60,70,80,90,100]})

我需要选择在

event\u id

中具有相同值但在

user\u id

中具有不同值的所有行。我尝试了这个（基于一个类似的问题，但没有被接受的答案）：

但是我不需要用户id相同的第一行-

问题的第二部分是——纠正重复记录的最佳方法是什么？如何在

事件id

（

\u new

）中添加后缀，但只能在此行中添加后缀：

    event_id    user_id index
3   3_new       1       40
6   9_new       3       70
7   3           4       80
8   9           4       90

尝试：

输出：

  event_id  user_id  index
3        3        1     40
6        9        3     70
7        3        4     80
8        9        4     90

嗯，我试着修改你的代码

test.groupby('event_id').
      filter(lambda x : (len(x['event_id'])==x['user_id'].nunique())&(len(x['event_id'])>1))
Out[85]: 
  event_id  user_id  index
3        3        1     40
6        9        3     70
7        3        4     80
8        9        4     90

要更正重复行，可以创建一个新的子键，但个人不建议修改原始列

df['subkey']=df.groupby('event_id').cumcount()

不，子项不是我需要的。@如果这是更安全的数据，我的意思是子项不会为有问题的行生成“1”。。。

test.groupby('event_id').
      filter(lambda x : (len(x['event_id'])==x['user_id'].nunique())&(len(x['event_id'])>1))
Out[85]: 
  event_id  user_id  index
3        3        1     40
6        9        3     70
7        3        4     80
8        9        4     90

df['subkey']=df.groupby('event_id').cumcount()