Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我们如何编写一个函数来获取重复值的行号和min(行号)?_Python_Pandas_Function - Fatal编程技术网

Python 我们如何编写一个函数来获取重复值的行号和min(行号)?

Python 我们如何编写一个函数来获取重复值的行号和min(行号)?,python,pandas,function,Python,Pandas,Function,输出: name job id_number 0 krul painter 125796 1 tim lawyer 789632 2 daisy engg 256498 3 alex dancer 456985 4 mandy arch 456258 5 krul painter 125796 6 tim lawyer 789632 7 tim l

输出:

  name    job       id_number
0  krul    painter    125796 
1  tim     lawyer     789632
2  daisy   engg       256498
3  alex    dancer     456985
4  mandy   arch       456258
5  krul    painter    125796
6  tim     lawyer     789632
7  tim     lawyer     789632
8  tim     lawyer     789632
9  daisy   engg       256498
10 daisy   engg       256498
IIUC,复制并转换最小行数的“idxmin”:

 dup_Index   min_index
    0            0
    5            0
    2            2
    9            2
   10            2
    6            6
    7            7
    8            8
输出:

(df[df.duplicated('id_number', keep=False)]
    .groupby('id_number')['id_number'].transform('idxmin')
    .sort_values()
 )
0     0
5     0
1     1
6     1
7     1
8     1
2     2
9     2
10    2
Name: id_number, dtype: int64

虽然我无法从这个问题中了解到分组背后的意图,但如果您希望看到唯一的事件及其重复索引,您可以始终求助于分组

df.groupby'name'、'job'、'id_number',as_index=True.applylambda x:x.index.tolist

输出:

(df[df.duplicated('id_number', keep=False)]
    .groupby('id_number')['id_number'].transform('idxmin')
    .sort_values()
 )
0     0
5     0
1     1
6     1
7     1
8     1
2     2
9     2
10    2
Name: id_number, dtype: int64
然后可以应用各种查询来获取列表的长度和第一个列表


根据您的需要,可能有更好的方法,例如@Quang Hoang的答案我不清楚,您能解释更多吗?我需要检查哪些行号是重复的,比如在我的df 0中,第5行号是完全重复的。我需要索引号0和5,对应于索引号的最小值,即0。输出应该像我的输出表,如果我们没有唯一值'id\u number列怎么办?我们可以为重复记录创建唯一的值列吗?