Python 仅与列中的not null值合并,并保留具有null的值

Python 仅与列中的not null值合并,并保留具有null的值,python,pandas,Python,Pandas,DF1: DF2: 预期: A B C 0 2.0 3.0 7 1 3.0 3.0 7 2 NaN NaN 7 3 NaN NaN 7 4 4.0 4.0 7 我一直在尝试下面的代码 代码: A B C_x C_y 0 2 3 6 7 3 NaN NaN 6 7 4 NaN NaN 6 7 5 4 4 6 7 是否有人可以帮助获得not null和null列的合并结

DF1:

DF2:

预期:

        A    B  C
0  2.0  3.0  7
1  3.0  3.0  7
2  NaN  NaN  7
3  NaN  NaN  7
4  4.0  4.0  7
我一直在尝试下面的代码

代码:

   A   B    C_x  C_y
0  2   3    6    7
3  NaN NaN  6    7
4  NaN NaN  6    7
5  4   4    6    7

是否有人可以帮助获得not null和null列的合并结果。我尝试了内部和左侧连接条件

不确定仅使用
合并
是否可行。下面是一个使用
concat
的示例:

import numpy as np
import pandas as pd


def get_df_merged_result(df1, df2, join_condition, column_list):
    return pd.merge(df1, df2, how=join_condition , on=column_list)

#Create a DataFrame
df1=pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2=pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})

print(df1)
print('-------------')
print(df2)
print('-------------')
print(get_df_merged_result(df1, df2, 'inner', ['A','B']))

见评论。希望这有帮助。

是否需要预期的
订单
?是的。我需要所需的顺序为什么在预期的输出中需要
5
记录(
A/B=4/4
),而不是
2
DF2
)(它们是相同的)?嗯,不知道你在问什么。我试图连接两个df,如果其中一个df为空,我不想让它连接,而是保持原样。@RjThomas谢谢。如果您找到更好的解决方案,请告诉我。
import numpy as np
import pandas as pd


def get_df_merged_result(df1, df2, join_condition, column_list):
    return pd.merge(df1, df2, how=join_condition , on=column_list)

#Create a DataFrame
df1=pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2=pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})

print(df1)
print('-------------')
print(df2)
print('-------------')
print(get_df_merged_result(df1, df2, 'inner', ['A','B']))
df1 = pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2 = pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})

# used for sequences
df1 = df1.reset_index()
df2 = df2.reset_index()
# cross records by A / B
df = df1.merge(df2, on=['A', 'B'])
df = df.rename(columns={'index_x': 'seq'}).drop(columns=['index_y'])
# select df with NaN records
nan_df = df2[(df2['A'].isna()) & (df2['B'].isna())]
nan_df = nan_df.rename(columns={'C': 'C_y'})
# generate C_x - C_y relations and merging into NaN records
nan_df = nan_df.merge(df[['C_x', 'C_y']].drop_duplicates(), on=['C_y'])
# union df with joined A / B records and df with NaN records
df = pd.concat([nan_df, df], sort=False).reset_index(drop=True)

def seq(x):
    if pd.isna(x['seq']):
        x['seq'] = x['index']
    return x

# just sorting by origin indexes
df = df.apply(seq, axis=1)
df = df.sort_values(['seq'])
df = df.drop(columns=['index', 'seq'])
print(df.head())
     # A    B  C_y  C_x
# 2  2.0  3.0  7.0  6.0
# 0  NaN  NaN  7.0  6.0
# 1  NaN  NaN  7.0  6.0
# 3  4.0  4.0  7.0  6.0