Python pd.DataFrame中的Nan（simmetrical矩阵）_Python_Pandas

Python pd.DataFrame中的Nan（simmetrical矩阵）

python pandas

Python pd.DataFrame中的Nan（simmetrical矩阵）,python,pandas,Python,Pandas,我有一个像这样的数据帧。我想把NaN去掉，把细胞往上移动。然后添加一个日期列并将其设置为索引 ciao google microsoft Search Volume 368000 NaN NaN Search Volume 368000 NaN NaN Search Volume 450000 NaN NaN Search Volume 450000 NaN N

我有一个像这样的数据帧。我想把NaN去掉，把细胞往上移动。然后添加一个日期列并将其设置为索引

                ciao      google    microsoft
Search Volume   368000    NaN       NaN
Search Volume   368000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000

输出应如下所示：

date = ['20140115', '20140215', '20140315', '20140415', '20140515', '20140615']

date        ciao    google      microsoft
20140115    368000  37200000    135000
20140215    368000  37200000    135000
20140315    450000  37200000    110000
20140415    450000  37200000    110000
20140515    450000  37200000    110000
20140615    450000  37200000    110000

看起来很简单，但我不知道怎么做。谢谢

这应该可以：

denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}

df_out = pd.DataFrame(denulled, index=date)

这应该起作用：

denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}

df_out = pd.DataFrame(denulled, index=date)

您还可以在列上使用dropna作为系列

df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates

您还可以在列上使用dropna作为系列

df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates

我的建议是：

pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
    index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])

重点是为每一列执行字典理解

dropna删除NaN项和值，允许从索引值。

我的建议是：

pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
    index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])

重点是为每一列执行字典理解

dropna删除NaN项和值，允许从

索引值。

一个棘手的解决方案，因为您有重复的索引

pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]: 
                  ciao      google  microsoft
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0

一个棘手的解决方案是由于索引重复

pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]: 
                  ciao      google  microsoft
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0

您可以将应用与dropna一起使用：

df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)

输出：

     ciao      google   microsoft  date     
 368000.0  37200000.0   135000.0   20140115 
 368000.0  37200000.0   135000.0   20140215 
 450000.0  37200000.0   110000.0   20140315 
 450000.0  37200000.0   110000.0   20140415 
 450000.0  37200000.0   110000.0   20140515 
 450000.0  37200000.0   110000.0   20140615

您可以将应用与dropna一起使用：

df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)

输出：

     ciao      google   microsoft  date     
 368000.0  37200000.0   135000.0   20140115 
 368000.0  37200000.0   135000.0   20140215 
 450000.0  37200000.0   110000.0   20140315 
 450000.0  37200000.0   110000.0   20140415 
 450000.0  37200000.0   110000.0   20140515 
 450000.0  37200000.0   110000.0   20140615