Python 如何删除Pandas中不到1%的行数中包含非零的列?
我有以下数据集:Python 如何删除Pandas中不到1%的行数中包含非零的列?,python,pandas,dataframe,data-analysis,data-filtering,Python,Pandas,Dataframe,Data Analysis,Data Filtering,我有以下数据集: Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 ... Col991 Col992 Col993 Col994 Col995 Col996 Col997 Col998 Col999 Col1000 rows
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 ...
Col991 Col992 Col993 Col994 Col995 Col996 Col997 Col998 Col999 Col1000
rows
Row1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Row2 0 0 0 0 0 23 0 0 0 0 ... 0 0 0 0 7 0 0 0 0 0
Row3 97 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Row4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Row5 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Row496 182 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 116 0 0 0
Row497 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Row498 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Row499 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Row500 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 125 0 0 0
我试图删除非零条目总数小于行数1%的列
我可以按列计算非零项的百分比
(df[df > 0.0].count()/df.shape[0])*100
对于那些列的数量仅在超过1%的行中具有非零的列,我应该如何使用此函数获取
df
?此外,如何更改代码以删除非零小于列的1%的行?使用mean
计算零的百分比:
df[df.eq(0).mean() >= 0.01]
您可以使用loc获取新df的指定列或行,如答案所示,基本上您可以这样做:
df.loc[rows, cols] # accepts boolean lists/arrays
因此,可以通过以下方法实现带移除柱的df:
col_condition = df[df > 0].count() / df.shape[0] >= .01
df_ = df[:, col_condition]
如果需要在列和行之间切换,只需使用
df.T
因此,对于非零数量小于列长度1%的行,情况也是如此:
row_condition = df.T[df.T > 0].count() / df.shape[1] >= .01
df_ = df[row_condition]
以更简洁的形式:
df_ = df.loc[:, df.gt(0).mean() >= .01] # keep columns
df_ = df[df.T.gt(0).mean() >= .01] # keep rows