Python在合并数据帧时使用条件逻辑/where
我有这些 df1Python在合并数据帧时使用条件逻辑/where,python,pandas,merge,conditional-statements,Python,Pandas,Merge,Conditional Statements,我有这些 df1 user_id code name code_equivalence name_equivalence 51 123 bi lovers 542 bi for marketing 51 123 bi lovers 545 i love bi 51 234 d
user_id code name code_equivalence name_equivalence
51 123 bi lovers 542 bi for marketing
51 123 bi lovers 545 i love bi
51 234 datascience 345 data and science
51 234 datascience 555 data lovers
51 255 antiquity history 429 roma
51 255 antiquity history 430 greece
52 123 bi lovers 542 bi for marketing
52 123 bi lovers 545 i love bi
52 256 modern history 500 france
52 256 modern history 501 germany
52 200 arts 400 arts I
52 200 arts 401 arts II
df2
user_id code name status
51 123 bi lovers ongoing
51 430 greece ongoing
52 501 germany ongoing
52 050 numbers ongoing
我想通过检查df2代码是否与df1代码或df1代码_等价以及df2名称是否与df1名称或df1名称_等价来合并它们,以获得df2状态。
像这样:
合并df
user_id code name code_equivalence name_equivalence status
51 123 bi lovers 542 bi for marketing ongoing
51 123 bi lovers 545 i love bi ongoing
51 234 datascience 345 data and science (null)
51 234 datascience 555 data lovers (null)
51 255 antiquity history 429 roma (null)
51 255 antiquity history 430 greece ongoing
52 123 bi lovers 542 bi for marketing (null)
52 123 bi lovers 545 i love bi (null)
52 256 modern history 500 france (null)
52 256 modern history 501 germany ongoing
52 200 arts 400 arts I (null)
52 200 arts 401 arts II (null)
user_id code name code_equivalence name_equivalence status
51 123 bi lovers [542, 545] [bi for marketing, i love bi] ongoing
51 234 datascience [345, 555] [data and science, data lovers] (null)
51 255 antiquity history [429, 430] [roma, greece] ongoing
52 123 bi lovers [542, 545] [bi for marketing, i love bi] (null)
52 256 modern history [500, 501] [france, germany] ongoing
52 200 arts [400, 401] [arts I, arts II] (null)
之后,我想转换数据以生成新的df,如下所示:
最终df
user_id code name code_equivalence name_equivalence status
51 123 bi lovers 542 bi for marketing ongoing
51 123 bi lovers 545 i love bi ongoing
51 234 datascience 345 data and science (null)
51 234 datascience 555 data lovers (null)
51 255 antiquity history 429 roma (null)
51 255 antiquity history 430 greece ongoing
52 123 bi lovers 542 bi for marketing (null)
52 123 bi lovers 545 i love bi (null)
52 256 modern history 500 france (null)
52 256 modern history 501 germany ongoing
52 200 arts 400 arts I (null)
52 200 arts 401 arts II (null)
user_id code name code_equivalence name_equivalence status
51 123 bi lovers [542, 545] [bi for marketing, i love bi] ongoing
51 234 datascience [345, 555] [data and science, data lovers] (null)
51 255 antiquity history [429, 430] [roma, greece] ongoing
52 123 bi lovers [542, 545] [bi for marketing, i love bi] (null)
52 256 modern history [500, 501] [france, germany] ongoing
52 200 arts [400, 401] [arts I, arts II] (null)
有人能帮我吗?不确定我的提问是否正确,但从我读到的内容来看,您进行了合并,现在您希望得到
最终结果
?如果是这样的话,考虑到merged
是您的合并数据帧,这应该可以完成工作
>>> merged.groupby(['user_id','code','name']).agg(list).reset_index()
user_id code name code_equivalence name_equivalence status
0 51 123 bi lovers [542, 545] [bi for marketing, i love bi] [ongoing, ongoing]
1 51 234 datascience [345, 555] [data and science, data lovers] [(null), (null)]
2 51 255 antiquity history [429, 430] [roma, greece] [(null), ongoing]
3 52 123 bi lovers [542, 545] [bi for marketing, i love bi] [(null), (null)]
4 52 200 arts [400, 401] [arts I, arts II] [(null), nan]
5 52 256 modern history [500, 501] [france, germany] [(null), ongoing]
如果您只有df1
和df2
,那么这就是完整的解决方案:
>>> (pd
...: .merge(df1,df2, left_on=['user_id','code','name'], right_on=['user_id','code','name'], how='left')
...: .groupby(['user_id','code','name'])
...: .agg(list)
...: .reset_index())
user_id code name code_equivalence name_equivalence status
0 51 123 bi lovers [542, 545] [bi for marketing, i love bi] [ongoing, ongoing]
1 51 234 datascience [345, 555] [data and science, data lovers] [nan, nan]
2 51 255 antiquity history [429, 430] [roma, greece] [nan, nan]
3 52 123 bi lovers [542, 545] [bi for marketing, i love bi] [nan, nan]
4 52 200 arts [400, 401] [arts I, arts II] [nan, nan]
5 52 256 modern history [500, 501] [france, germany] [nan, nan]
这就是我如何通过三个步骤获得数据帧的方法:
merge_df = pd.merge(df1, df2[["code","status"]], left_on=["code"], right_on=["code",], how="left")
merge_df2 = pd.merge(df1, df2[["code","status"]], left_on=["code_equivalence"], right_on=["code",], how="left")
merge_df["status"].fillna(merge_df2["status"], inplace=True)
然而,我想知道是否有一个简单的方法可以做到这一点(可能是的)。可靠的要求,但到目前为止,您遇到了哪些问题
merge
和groupby
是执行此任务所需的正确工具。即使使用left join,使用merge时,我也会丢失df1唯一的代码和名称。您可以检查merge\u df的第五行吗?基于df1和df2,我认为不应该有匹配项,所以状态列中不应该是(null)吗?我是指代码为255,名字为罗姆的那一行。@Michał89是的,你说得对。它是空的