Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/325.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何处理熊猫数据框中的重复条目?_Python_Pandas - Fatal编程技术网

Python 如何处理熊猫数据框中的重复条目?

Python 如何处理熊猫数据框中的重复条目?,python,pandas,Python,Pandas,我有一个带有以下条目的df ID FIRST_NAME FIRST_SUBJECT SECOND_SUBJECT A2035559 Sometsdf Science A2035559 Sometsdf ENGINEERING A20340619 Nsdsjes MATHS A20340619 Nsdsjes SCIENCE 我想通过列“ID”标识重复行并删除重复行,但将“第一主题

我有一个带有以下条目的df

ID            FIRST_NAME    FIRST_SUBJECT     SECOND_SUBJECT
A2035559    Sometsdf        Science
A2035559    Sometsdf        ENGINEERING
A20340619   Nsdsjes         MATHS
A20340619   Nsdsjes         SCIENCE
我想通过列“ID”标识重复行并删除重复行,但将“第一主题”从已删除行移动到原始行的“第二主题”列。所以我有这个

ID            FIRST_NAME    FIRST_SUBJECT     SECOND_SUBJECT
A2035559    Sometsdf        Science            ENGINEERING
A20340619   Nsdsjes         MATHS              SCIENCE
这对我来说似乎非常棘手,我开始尝试按“ID”对数据帧进行排序,但我所有的ID都以“A”开头,所以这不起作用。我该如何实现这一目标

我有另一个想法我正在尝试

我创建了数据帧的两个副本,df1和df2。因为我最多只有一个副本(即同一行的两个副本),所以我通过分别获取df1和df2中的最后一个和第一个副本来删除副本,然后尝试合并这两个副本

  df1 = df.drop_duplicates('ID' , take_last=False)
  df2 = df.drop_duplicates('ID' , take_last=True)

  df1['SECOND_SUBJECT'] = df2['FIRST_SUBJECT']

这行得通吗?

我不是一个真正的Python或pandas开发人员,所以不要认为这是“正确”的方式(我肯定不是——我有点怀疑第一/第二主题的这种方法,而不是更一般的映射模式)——这不能很好地扩展到3个以上的主题

data = {
    'ID': ['A2035559', 'A20340619', 'A2035559', 'A20340619'],
    'FIRST_NAME': ['Sometsdf', 'Nsdsjes', 'Sometsdf', 'Nsdsjes'],
    'FIRST_SUBJECT': ['SCIENCE', 'MATHS', 'ENGINEERING', 'SCIENCE'],
    'SECOND_SUBJECT': [None, None, None, None]
}

d = pandas.DataFrame(data=data, columns=['ID', 'FIRST_NAME', 'FIRST_SUBJECT', 'SECOND_SUBJECT'])

dup_first = d.drop_duplicates(subset=['ID'], take_last=True)
dup_last = d.drop_duplicates(subset=['ID'], take_last=False)

for row in dup_last['ID']:
    dup_first.loc[dup_first['ID'] == row, 'SECOND_SUBJECT'] = dup_last.loc[dup_last['ID'] == row, 'FIRST_SUBJECT'].values[0]

dup_first
产生

ID         FIRST_NAME  FIRST_SUBJECT  SECOND_SUBJECT
A2035559   Sometsdf    ENGINEERING    SCIENCE
A20340619  Nsdsjes     SCIENCE        MATHS