Python 使用字符串和NaN提取多索引数据帧的最大值
我得到了以下多索引数据帧:Python 使用字符串和NaN提取多索引数据帧的最大值,python,string,pandas,max,Python,String,Pandas,Max,我得到了以下多索引数据帧: first bar baz foo second one two one two one two first second bar one
first bar baz foo
second one two one two one two
first second
bar one NaN -0.056213 0.988634 0.103149 1.5858 -0.101334
two -0.47464 -0.010561 2.679586 -0.080154 <LQ -0.422063
baz one <LQ 0.220080 1.495349 0.302883 -0.205234 0.781887
two 0.638597 0.276678 -0.408217 -0.083598 -1.15187 -1.724097
foo one 0.275549 -1.088070 0.259929 -0.782472 -1.1825 -1.346999
two 0.857858 0.783795 -0.655590 -1.969776 -0.964557 -0.220568
以下是我尝试过的:
df.xs('one', level=1, axis = 1).max(axis=0, level=1, skipna = True, numeric_only = False)
得到的结果是:
first bar baz foo
second
one 0.275549 1.495349 1.5858
two 0.857858 2.679586 -0.964557
first baz
second
one 1.495349
two 2.679586
如果一个单元格包含字符串,如何使Pandas不忽略整个列
(创建方式如下:)
array=[np.array(['bar','bar','baz','baz','foo','foo','qux','qux']),
数组(['one','two','one','two','one','two','one','two'])]
元组=列表(zip(*数组))
index=pd.MultiIndex.from_元组(元组,名称=['first','second'])
df=pd.DataFrame(np.random.randn(6,6),index=index[:6],columns=index[:6])
df['bar','one'].loc['bar','one']=np.NaN
df['bar','one'].loc['baz','one']='我想您需要将非数字替换为na
:
(df.xs('one', level=1, axis=1)
.apply(pd.to_numeric, errors='coerce')
.max(level=1,skipna=True)
)
输出(带有np.random.seed(1)
):
您的列和索引具有相同的名称,因此非常混乱。
(df.xs('one', level=1, axis=1)
.apply(pd.to_numeric, errors='coerce')
.max(level=1,skipna=True)
)
first bar baz foo
second
one 0.900856 1.133769 0.865408
two 1.744812 0.319039 0.901591