如何在Python中比较两个不同数据帧的单元格值?
我有两个数据帧: 人_df如何在Python中比较两个不同数据帧的单元格值?,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个数据帧: 人_df Name Emplid Country 0 DK 123 India 1 JS 456 India 2 RM 789 China 3 MS 111 China 4 SR 222 China Target_df Country Category Target 0 India Marketing Reduc
Name Emplid Country
0 DK 123 India
1 JS 456 India
2 RM 789 China
3 MS 111 China
4 SR 222 China
Target_df
Country Category Target
0 India Marketing Reduce spend by $xy.
1 India R&D Increase spend by $dd.
2 India Infra Reduce spend by $kn.
3 China Marketing Increase spend by $eg.
4 China R&D Increase spend by $cb.
5 China Infra Reduce spend by $mn.
我的目标是基于每个人的国家创建第三个数据框,如下所示:
个人_df
TargetID Category Target
DK12301 Marketing Reduce spend by $xy.
DK12302 R&D Increase spend by $dd.
DK12303 Infra Reduce spend by $kn.
JS45601 Marketing Reduce spend by $xy.
JS45602 R&D Increase spend by $dd.
JS45603 Infra Reduce spend by $kn.
RM78901 Marketing Increase spend by $eg.
RM78902 R&D Increase spend by $cb.
RM78903 Infra Reduce spend by $mn.
MS11101 Marketing Increase spend by $eg.
MS11102 R&D Increase spend by $cb.
MS11103 Infra Reduce spend by $mn.
SR22201 Marketing Increase spend by $eg.
SR22202 R&D Increase spend by $cb.
SR22203 Infra Reduce spend by $mn.
for index, row in Person_df.iterrows():
for index1, row1 in Goals_df.iterrows():
If Person_df['country'] == Person_df['country'] : #I know this is incorrect
data = []
#populate data[] with selected values for one person.
#append data[] to Individual_df
基本上,我必须从person_df中选取一个人,将他/她的国家与Target_df中提到的国家匹配,然后将每个目标分配给此人(并存储在个人_df中)
问题是,我是python新手,不知道如何进行这个国家的比较
我写了下面的代码:
TargetID Category Target
DK12301 Marketing Reduce spend by $xy.
DK12302 R&D Increase spend by $dd.
DK12303 Infra Reduce spend by $kn.
JS45601 Marketing Reduce spend by $xy.
JS45602 R&D Increase spend by $dd.
JS45603 Infra Reduce spend by $kn.
RM78901 Marketing Increase spend by $eg.
RM78902 R&D Increase spend by $cb.
RM78903 Infra Reduce spend by $mn.
MS11101 Marketing Increase spend by $eg.
MS11102 R&D Increase spend by $cb.
MS11103 Infra Reduce spend by $mn.
SR22201 Marketing Increase spend by $eg.
SR22202 R&D Increase spend by $cb.
SR22203 Infra Reduce spend by $mn.
for index, row in Person_df.iterrows():
for index1, row1 in Goals_df.iterrows():
If Person_df['country'] == Person_df['country'] : #I know this is incorrect
data = []
#populate data[] with selected values for one person.
#append data[] to Individual_df
我需要以下几点的帮助:
1) 我如何才能在这里为每个人的国家进行比较
2) 即使我知道如何进行比较,我编写的代码也没有效率,因为我在这里进行了大量不必要的迭代。有什么建议吗?我该如何改进
谢谢 试试这个
Individual_df = pd.merge(Person_df, Target_df2, on=['Country'], how='left')
Individual_df['TargetID'] = Individual_df['Name'] + df3['Emplid'].astype(str) + ((df3.groupby('Emplid').cumcount() + 1).astype(str).str.zfill(2))
Individual_df = Individual_df[['TargetID', 'Category', 'Target']]
print Individual_df
输出:
TargetID Category Target
0 DK12301 Marketing Reduce spend by $xy.
1 DK12302 R&D Increase spend by $dd.
2 DK12303 Infra Reduce spend by $kn.
3 JS45601 Marketing Reduce spend by $xy.
4 JS45602 R&D Increase spend by $dd.
5 JS45603 Infra Reduce spend by $kn.
6 RM78901 Marketing Increase spend by $eg.
7 RM78902 R&D Increase spend by $cb.
8 RM78903 Infra Reduce spend by $mn.
9 MS11101 Marketing Increase spend by $eg.
10 MS11102 R&D Increase spend by $cb.
11 MS11103 Infra Reduce spend by $mn.
12 SR22201 Marketing Increase spend by $eg.
13 SR22202 R&D Increase spend by $cb.
14 SR22203 Infra Reduce spend by $mn.
说明:
unique_countries=df1['Country'].unique().tolist()
for index, row in df2.iterrows():
if row['Country'] in unique_countries:
print row.values
//do operation
说明:
嗨,萨辛,谢谢你的解决方案。我正在尝试这个。在原始区域数据中,最终数据集的列数和条件更多,因此需要一些时间。当前正在尝试更正“缓冲区的维度数错误(预期为1,实际为2)”!我一完成就会更新:)是的,当然。祝你好运:)嗨,萨辛,它工作得很好!我是python新手,但我还不知道数据帧连接。只是想知道,假设这些连接选项不存在,我们被迫使用good old for循环进行数据比较(就像我试图做的那样)……那么我们如何在不同的数据帧中比较这两个值?@AnshulRai-我没有回答你的问题。是否要使用for循环比较两个不同的列?是。希望将一个数据帧的列值与另一个数据帧的列值进行比较(第一行的第一个数据框中的“country”列为“India”,所以我想通过另一个数据框的“country”列,选择列值为“India”的行。现在我知道这根本不需要,因为我们可以简单地在公共列上联接。仍然想知道我们是否可以在不联接的情况下实现这一点(只是为了学习)。再次感谢Thasin提供第二种解决方案!@AnshulRai-欢迎您。请随时提出python或pandas问题。如果您有任何帮助,请通过我的电子邮件与我联系。