Python pd.DataFrame中的Nan(simmetrical矩阵)
我有一个像这样的数据帧。我想把NaN去掉,把细胞往上移动。然后添加一个日期列并将其设置为索引Python pd.DataFrame中的Nan(simmetrical矩阵),python,pandas,Python,Pandas,我有一个像这样的数据帧。我想把NaN去掉,把细胞往上移动。然后添加一个日期列并将其设置为索引 ciao google microsoft Search Volume 368000 NaN NaN Search Volume 368000 NaN NaN Search Volume 450000 NaN NaN Search Volume 450000 NaN N
ciao google microsoft
Search Volume 368000 NaN NaN
Search Volume 368000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN NaN 135000
Search Volume NaN NaN 135000
Search Volume NaN NaN 110000
Search Volume NaN NaN 110000
Search Volume NaN NaN 110000
Search Volume NaN NaN 110000
输出应如下所示:
date = ['20140115', '20140215', '20140315', '20140415', '20140515', '20140615']
date ciao google microsoft
20140115 368000 37200000 135000
20140215 368000 37200000 135000
20140315 450000 37200000 110000
20140415 450000 37200000 110000
20140515 450000 37200000 110000
20140615 450000 37200000 110000
看起来很简单,但我不知道怎么做。谢谢这应该可以:
denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}
df_out = pd.DataFrame(denulled, index=date)
这应该起作用:
denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}
df_out = pd.DataFrame(denulled, index=date)
您还可以在列上使用dropna作为系列
df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates
您还可以在列上使用dropna作为系列
df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates
我的建议是:
pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])
重点是为每一列执行字典理解
dropna删除NaN项和值,允许从
索引值。我的建议是:
pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])
重点是为每一列执行字典理解
dropna删除NaN项和值,允许从
索引值。一个棘手的解决方案,因为您有重复的索引
pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]:
ciao google microsoft
SearchVolume 368000.0 37200000.0 135000.0
SearchVolume 368000.0 37200000.0 135000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
一个棘手的解决方案是由于索引重复
pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]:
ciao google microsoft
SearchVolume 368000.0 37200000.0 135000.0
SearchVolume 368000.0 37200000.0 135000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
您可以将应用与dropna一起使用:
df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)
输出:
ciao google microsoft date
368000.0 37200000.0 135000.0 20140115
368000.0 37200000.0 135000.0 20140215
450000.0 37200000.0 110000.0 20140315
450000.0 37200000.0 110000.0 20140415
450000.0 37200000.0 110000.0 20140515
450000.0 37200000.0 110000.0 20140615
您可以将应用与dropna一起使用:
df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)
输出:
ciao google microsoft date
368000.0 37200000.0 135000.0 20140115
368000.0 37200000.0 135000.0 20140215
450000.0 37200000.0 110000.0 20140315
450000.0 37200000.0 110000.0 20140415
450000.0 37200000.0 110000.0 20140515
450000.0 37200000.0 110000.0 20140615