Python 数据帧错误-值错误:只能比较标记相同的系列对象
我有两个数据帧,如下所示:Python 数据帧错误-值错误:只能比较标记相同的系列对象,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我有两个数据帧,如下所示: df_name: Student_ID Name DOB 0 1 Raju 1993-02-02 1 2 Indu 1987-01-04 2 3 Laya 2000-06-24 任务是创建一个数据框(在一个框架下),在这里我需要添加df_标记['Int1/40']&df_标记['Int2/40'],如果df_名称['Student_ID']==df_标记['Student_ID'
df_name:
Student_ID Name DOB
0 1 Raju 1993-02-02
1 2 Indu 1987-01-04
2 3 Laya 2000-06-24
任务是创建一个数据框(在一个框架下),在这里我需要添加df_标记['Int1/40']
&df_标记['Int2/40']
,如果df_名称['Student_ID']==df_标记['Student_ID']
Student_id Name DOB Tam/50
0 1 Raju 1993-02-02 NaN
1 2 Indu 1987-01-04 NaN
2 3 Laya 2000-06-24 NaN
我试过了
df_out['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1).where(df_marks['Subject']==df_out['Student_id'])
但它给出的错误是
ValueError: Can only compare identically-labeled Series objects
我们有什么简单的方法可以做到这一点吗
问候,,
对于df_name
中的新列,Deepak Dash与聚合的sum
一起使用:
df_marks['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1)
df_name = df_name.join(df_marks.groupby('Student_ID')['Tam/50'].sum(), on='Student_ID')
print (df_name)
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
或不带辅助列的解决方案:
s = (df_marks[['Int1/40','Int2/40']].sum(axis=1)
.groupby(df_marks['Student_ID'])
.sum()
.rename('Tam/50'))
df_name = df_name.join(s, on='Student_ID')
print (df_name)
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
您可以使用来匹配学生ID
上的两个数据帧。然后使用groupby
和sum
:
In [574]: res = pd.merge(df_name, df_marks,on='Student_ID')
In [592]: r = res.groupby(['Student_ID', 'Name', 'DOB'])[['Int1/40','Int2/40']].sum(1).reset_index()
In [594]: r['Tam/50'] = r['Int1/40'] + r['Int2/40']
In [604]: r.drop(['Int1/40', 'Int2/40'], 1, inplace=True)
In [605]: r
Out[605]:
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123
什么是
df_out
?你为什么要比较df_分数['Subject']==df_out['Student_id']?请使用正确的预期输出正确编辑您的问题。基本上,df_out是我的输出数据框,如果学生ID匹配,我需要添加列('Int1/40','Int2/40')
In [574]: res = pd.merge(df_name, df_marks,on='Student_ID')
In [592]: r = res.groupby(['Student_ID', 'Name', 'DOB'])[['Int1/40','Int2/40']].sum(1).reset_index()
In [594]: r['Tam/50'] = r['Int1/40'] + r['Int2/40']
In [604]: r.drop(['Int1/40', 'Int2/40'], 1, inplace=True)
In [605]: r
Out[605]:
Student_ID Name DOB Tam/50
0 1 Raju 1993-02-02 198
1 2 Indu 1987-01-04 204
2 3 Laya 2000-06-24 123