Python concat生成nan值_Python_Pandas_Dataframe_Concatenation_Nan

Python concat生成nan值

python pandas dataframe

Python concat生成nan值,python,pandas,dataframe,concatenation,nan,Python,Pandas,Dataframe,Concatenation,Nan,我很好奇为什么熊猫中两个数据帧的简单连接： shape: (66441, 1) dtypes: prediction int64 dtype: object isnull().sum(): prediction 0 dtype: int64 shape: (66441, 1) CUSTOMER_ID int64 dtype: object isnull().sum() CUSTOMER_ID 0 dtype: int64 形状相同且两者都没有NaN值 foo = pd

我很好奇为什么熊猫中两个数据帧的简单连接：

shape: (66441, 1)
dtypes: prediction    int64
dtype: object
isnull().sum(): prediction    0
dtype: int64

shape: (66441, 1)
CUSTOMER_ID    int64
dtype: object
isnull().sum() CUSTOMER_ID    0
dtype: int64

形状相同且两者都没有NaN值

foo = pd.concat([initId, ypred], join='outer', axis=1)
print(foo.shape)
print(foo.isnull().sum())

如果合并，可能会导致大量NaN值

(83384, 2)
CUSTOMER_ID    16943
prediction     16943

如何解决此问题并防止引入NaN值？试着像这样复制它

aaa  = pd.DataFrame([0,1,0,1,0,0], columns=['prediction'])
print(aaa)
bbb  = pd.DataFrame([0,0,1,0,1,1], columns=['groundTruth'])
print(bbb)
pd.concat([aaa, bbb], axis=1)

失败，例如，由于没有引入NaN值，工作正常。

我认为不同的索引值存在问题，因此，

concat

无法对齐get

NaN

：

aaa  = pd.DataFrame([0,1,0,1,0,0], columns=['prediction'], index=[4,5,8,7,10,12])
print(aaa)
    prediction
4            0
5            1
8            0
7            1
10           0
12           0

bbb  = pd.DataFrame([0,0,1,0,1,1], columns=['groundTruth'])
print(bbb)
   groundTruth
0            0
1            0
2            1
3            0
4            1
5            1

print (pd.concat([aaa, bbb], axis=1))
    prediction  groundTruth
0          NaN          0.0
1          NaN          0.0
2          NaN          1.0
3          NaN          0.0
4          0.0          1.0
5          1.0          1.0
7          1.0          NaN
8          0.0          NaN
10         0.0          NaN
12         0.0          NaN

解决方案是，如果不需要索引值：

aaa.reset_index(drop=True, inplace=True)
bbb.reset_index(drop=True, inplace=True)

print(aaa)
   prediction
0           0
1           1
2           0
3           1
4           0
5           0

print(bbb)
   groundTruth
0            0
1            0
2            1
3            0
4            1
5            1

print (pd.concat([aaa, bbb], axis=1))
   prediction  groundTruth
0           0            0
1           1            0
2           0            1
3           1            0
4           0            1
5           0            1

您可以这样做：

concatenated_dataframes=concat(
[
数据帧1.重置索引（drop=True），
数据帧2.重置索引（drop=True），
数据帧3.重置索引（drop=True）
],
轴=1，
忽略_index=True，
)
连接的\u数据帧\u列=[
列表（dataframe_1.列），
列表（dataframe_2.列），
列表（dataframe_3.列）
]
展平=lambda嵌套_列表：[嵌套_列表中的子列表中的项目，子列表中的项目]
连接的数据帧.columns=展平（连接的数据帧\u列）

要连接多个

DataFrame

s并保留列名/avoid

NaN

，建议使用resetIndex（）/ignore\u index=True？因为这两种方法都不能解决问题。嗯，如果重置索引仍然是同一个问题吗？事实上，您在上面对pd.concat（[ypred.reset_index（drop=True），initId.reset_index（drop=True）]，axis=1）的评论非常有效！非常感谢。似乎任何行的删除之后都必须执行

reset\u index

，以避免以后在处理过程中出现此类索引问题。我也遇到过同样的问题，甚至当我尝试简单地添加一列时，它也会给我NaN。落地生根为我解决了这个问题