Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将数据帧中具有不同信息或NAN的两个重复行合并为一行?_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何将数据帧中具有不同信息或NAN的两个重复行合并为一行?

Python 如何将数据帧中具有不同信息或NAN的两个重复行合并为一行?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框,如下所示: Name Email Assessment 1 Assessment 2 Assessment 3 Assessment 4 Assessment 5 0 abc abc@email.com Good NaN NaN NaN NaN 1 abc abc@email.com NaN Good

我有一个数据框,如下所示:

 Name           Email Assessment 1 Assessment 2 Assessment 3 Assessment 4    Assessment 5
0   abc   abc@email.com         Good          NaN          NaN          NaN          NaN
1   abc   abc@email.com          NaN        Good          Good          NaN          NaN   
2   abc   abc@email.com          NaN          NaN          NaN         Good          NaN   
3   abc   abc@email.com          NaN          NaN          NaN          NaN         Good 
4  john  john@email.com         Good        Good          Fail          NaN          NaN   
5  john  john@email.com          NaN          NaN         Good          NaN          NaN
6  john  john@email.com          NaN          NaN          NaN         Good         Good   
7   joe   joe@email.com         Good        Good          Fail         Fail          NaN
8   joe   joe@email.com          NaN          NaN         Fail         Good         Good   
9   joe   joe@email.com          NaN          NaN         Fail          NaN          NaN
在这里,我尝试合并重复的记录,使用电子邮件作为键,并保留以下行中未丢失的信息作为最终信息。在上面的示例中,以下是我的预期输出:

Name           Email Assessment 1 Assessment 2 Assessment 3 Assessment 4      Assessment 5
0   abc   abc@email.com         Good        Good          Good         Good         Good   
1  john  john@email.com         Good        Good          Good         Good         Good   
2   joe   joe@email.com         Good        Good          Fail         Good         Good   
我在这里看到了许多关于行组合的解决方案,但它们大多涉及内容的串联,即,它们为电子邮件创建一个行值,如
Good Good
Good Good Fail
,但不是以示例输出中所示的方式。请帮忙


样本数据

data_dict = pd.DataFrame({'Name': ['abc','abc','abc','abc','john','john','john','joe','joe','joe'],
             'Email': ['abc@email.com','abc@email.com','abc@email.com','abc@email.com','john@email.com','john@email.com','john@email.com','joe@email.com','joe@email.com','joe@email.com'],
             'Assessment 1': ['Good', np.nan, np.nan, np.nan, 'Good', np.nan, np.nan, 'Good', np.nan, np.nan],
             'Assessment 2': [np.nan,'Good',np.nan,np.nan,'Good',np.nan,np.nan,'Good ',np.nan,np.nan],
             'Assessment 3': [np.nan,'Good',np.nan,np.nan,'Fail','Good',np.nan,'Fail','Fail','Fail'],
             'Assessment 4': [np.nan,np.nan,'Good',np.nan,np.nan,np.nan,'Good','Fail','Good',np.nan],
             'Assessment 5': [np.nan,np.nan,np.nan,'Good',np.nan,np.nan,'Good','Fail','Good',np.nan]} )

如果需要最后一次唯一排序,且每组不缺少值,请使用:

df = pd.DataFrame(data_dict)

def f(x):
    try:
        return np.sort(x.dropna().unique())[-1]
    except:
        return np.nan

df = df.groupby(['Name','Email'], as_index=False, sort=False).agg(f)
print (df)
   Name           Email Assessment 1 Assessment 2 Assessment 3 Assessment 4  \
0   abc   abc@email.com         Good         Good         Good         Good   
1  john  john@email.com         Good         Good         Good         Good   
2   joe   joe@email.com         Good        Good          Fail         Good   

  Assessment 5  
0         Good  
1         Good  
2         Good  
编辑:

如果需要最后一个非缺失值,请使用:


如果每组的值为
故障良好
故障良好
,则逻辑是什么?在任何情况下,应以最新一行的结果为准
Fail-Good
,将
Good
保持在后一行中,按照相同的逻辑,
Good-Fail
Fail
将是组合行中的值。确定,然后编辑了答案。
df = df.groupby(['Name','Email'], as_index=False, sort=False).last()
print (df)
   Name           Email Assessment 1 Assessment 2 Assessment 3 Assessment 4  \
0   abc   abc@email.com         Good         Good         Good         Good   
1  john  john@email.com         Good         Good         Good         Good   
2   joe   joe@email.com         Good        Good          Fail         Good   

  Assessment 5  
0         Good  
1         Good  
2         Good