Python 3.x 合并缺少不同列的数据框_Python 3.x_Pandas

Python 3.x 合并缺少不同列的数据框

python-3.x pandas

Python 3.x 合并缺少不同列的数据框,python-3.x,pandas,Python 3.x,Pandas,我有一个熊猫数据框，看起来像这样： asset, cusip, information1, information2, ...., information_n 1x4, 43942, 45, , NaN, , , NaN 1x4, 43942, NaN, , "hello", , NaN 1x4, 43942, NaN, , NaN, , "goodbye" ... 我想要的是： asset, cu

我有一个熊猫数据框，看起来像这样：

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       ,  NaN,    ,    , NaN
1x4,   43942,    NaN,      ,  "hello",     , NaN
1x4,   43942,    NaN,      ,  NaN,     , "goodbye"
...

我想要的是：

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       , "hello",    ,    , "goodbye"
...

本质上，我希望在匹配“资产”和“CUSIP”时崩溃，而不管字段是什么。信息1…信息中只有一个条目不是NAN

请注意，某些列可能是int、某些字符串、其他浮点数等。

您可以使用groupby和first（），它为您提供first值，并且在您的情况下仅提供非NaN值

df = df.groupby(['asset', 'cusip']).first().reset_index()


    asset   cusip   information1    information2    information_n
0   1x4     43942   45              "hello"         "goodbye"

回答得很好！同样，您可以传递

as_index=False

参数并放弃

重置_index

：

df.groupby（['asset'，'cusip'，as_index=False）。首先（）

@piRSquared，谢谢！我需要记住，因为_index=False。所以习惯于使用groupby和reset_索引：）