Python 基于小于任何列总和百分比的行筛选数据帧
以下是我的示例数据:Python 基于小于任何列总和百分比的行筛选数据帧,python,pandas,Python,Pandas,以下是我的示例数据: {'Rhesus': {('count', u'augCGP,transMap'): 6.0, ('count', u'augTM,transMap'): 11563.0, ('count', u'transMap'): 39930.0, ('count', u'augTM'): 5114.0, ('count', u'augCGP,augTM,augTMR,transMap'): 27.0, ('count', u'augCGP,augTMR'): 1.0, ('coun
{'Rhesus': {('count', u'augCGP,transMap'): 6.0, ('count', u'augTM,transMap'): 11563.0, ('count', u'transMap'): 39930.0, ('count', u'augTM'): 5114.0, ('count', u'augCGP,augTM,augTMR,transMap'): 27.0, ('count', u'augCGP,augTMR'): 1.0, ('count', u'augTMR,transMap'): 145.0, ('count', u'augTMR'): 4217.0, ('count', u'augCGP,augTMR,transMap'): nan, ('count', u'augCGP,augTM,augTMR'): nan, ('count', u'augCGP'): 4239.0, ('count', u'augCGP,augTM,transMap'): 3.0, ('count', u'augTM,augTMR,transMap'): 6296.0, ('count', u'augTM,augTMR'): 3357.0}, 'Susie': {('count', u'augCGP,transMap'): 11.0, ('count', u'augTM,transMap'): 10821.0, ('count', u'transMap'): 41300.0, ('count', u'augTM'): 2894.0, ('count', u'augCGP,augTM,augTMR,transMap'): 43.0, ('count', u'augCGP,augTMR'): nan, ('count', u'augTMR,transMap'): 353.0, ('count', u'augTMR'): 5399.0, ('count', u'augCGP,augTMR,transMap'): 1.0, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 2740.0, ('count', u'augCGP,augTM,transMap'): 2.0, ('count', u'augTM,augTMR,transMap'): 10196.0, ('count', u'augTM,augTMR'): 2789.0}, 'Clint': {('count', u'augCGP,transMap'): 16.0, ('count', u'augTM,transMap'): 17341.0, ('count', u'transMap'): 39284.0, ('count', u'augTM'): 2888.0, ('count', u'augCGP,augTM,augTMR,transMap'): 80.0, ('count', u'augCGP,augTMR'): 1.0, ('count', u'augTMR,transMap'): 144.0, ('count', u'augTMR'): 2881.0, ('count', u'augCGP,augTMR,transMap'): nan, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 2338.0, ('count', u'augCGP,augTM,transMap'): 8.0, ('count', u'augTM,augTMR,transMap'): 8725.0, ('count', u'augTM,augTMR'): 1441.0}, 'Orangutan': {('count', u'augCGP,transMap'): 7.0, ('count', u'augTM,transMap'): 6568.0, ('count', u'transMap'): 46113.0, ('count', u'augTM'): 3656.0, ('count', u'augCGP,augTM,augTMR,transMap'): 17.0, ('count', u'augCGP,augTMR'): nan, ('count', u'augTMR,transMap'): 284.0, ('count', u'augTMR'): 5952.0, ('count', u'augCGP,augTMR,transMap'): 1.0, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 5753.0, ('count', u'augCGP,augTM,transMap'): 3.0, ('count', u'augTM,augTMR,transMap'): 6567.0, ('count', u'augTM,augTMR'): 3520.0}, 'Gibbon': {('count', u'augCGP,transMap'): 5.0, ('count', u'augTM,transMap'): 6828.0, ('count', u'transMap'): 44285.0, ('count', u'augTM'): 4313.0, ('count', u'augCGP,augTM,augTMR,transMap'): 16.0, ('count', u'augCGP,augTMR'): nan, ('count', u'augTMR,transMap'): 187.0, ('count', u'augTMR'): 6550.0, ('count', u'augCGP,augTMR,transMap'): nan, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 4178.0, ('count', u'augCGP,augTM,transMap'): nan, ('count', u'augTM,augTMR,transMap'): 5839.0, ('count', u'augTM,augTMR'): 3882.0}}
这是一个数据帧,看起来像:
>>> df
genome Clint Gibbon Orangutan Rhesus \
Transcript Modes
count augCGP 2338.0 4178.0 5753.0 4239.0
augCGP,augTM,augTMR 1.0 1.0 1.0 NaN
augCGP,augTM,augTMR,transMap 80.0 16.0 17.0 27.0
augCGP,augTM,transMap 8.0 NaN 3.0 3.0
augCGP,augTMR 1.0 NaN NaN 1.0
augCGP,augTMR,transMap NaN NaN 1.0 NaN
augCGP,transMap 16.0 5.0 7.0 6.0
augTM 2888.0 4313.0 3656.0 5114.0
augTM,augTMR 1441.0 3882.0 3520.0 3357.0
augTM,augTMR,transMap 8725.0 5839.0 6567.0 6296.0
augTM,transMap 17341.0 6828.0 6568.0 11563.0
augTMR 2881.0 6550.0 5952.0 4217.0
augTMR,transMap 144.0 187.0 284.0 145.0
transMap 39284.0 44285.0 46113.0 39930.0
genome Susie
Transcript Modes
count augCGP 2740.0
augCGP,augTM,augTMR 1.0
augCGP,augTM,augTMR,transMap 43.0
augCGP,augTM,transMap 2.0
augCGP,augTMR NaN
augCGP,augTMR,transMap 1.0
augCGP,transMap 11.0
augTM 2894.0
augTM,augTMR 2789.0
augTM,augTMR,transMap 10196.0
augTM,transMap 10821.0
augTMR 5399.0
augTMR,transMap 353.0
transMap 41300.0
正如您所看到的,其中一些类别的条目非常少。我想过滤每一行(转录本模式
),这样,如果它们在每一列的总数中所占的比例小于1%,就会删除它们。因此,我得到的数据帧如下所示:
>>> df
genome Clint Gibbon Orangutan Rhesus \
Transcript Modes
count augCGP 2338.0 4178.0 5753.0 4239.0
augTM 2888.0 4313.0 3656.0 5114.0
augTM,augTMR 1441.0 3882.0 3520.0 3357.0
augTM,augTMR,transMap 8725.0 5839.0 6567.0 6296.0
augTM,transMap 17341.0 6828.0 6568.0 11563.0
augTMR 2881.0 6550.0 5952.0 4217.0
transMap 39284.0 44285.0 46113.0 39930.0
genome Susie
Transcript Modes
count augCGP 2740.0
augTM 2894.0
augTM,augTMR 2789.0
augTM,augTMR,transMap 10196.0
augTM,transMap 10821.0
augTMR 5399.0
transMap 41300.0
屈服
Clint Gibbon Orangutan Rhesus Susie
count augCGP 2338.0 4178.0 5753.0 4239.0 2740.0
augTM 2888.0 4313.0 3656.0 5114.0 2894.0
augTM,augTMR 1441.0 3882.0 3520.0 3357.0 2789.0
augTM,augTMR,transMap 8725.0 5839.0 6567.0 6296.0 10196.0
augTM,transMap 17341.0 6828.0 6568.0 11563.0 10821.0
augTMR 2881.0 6550.0 5952.0 4217.0 5399.0
transMap 39284.0 44285.0 46113.0 39930.0 41300.0
屈服
Clint Gibbon Orangutan Rhesus Susie
count augCGP 2338.0 4178.0 5753.0 4239.0 2740.0
augTM 2888.0 4313.0 3656.0 5114.0 2894.0
augTM,augTMR 1441.0 3882.0 3520.0 3357.0 2789.0
augTM,augTMR,transMap 8725.0 5839.0 6567.0 6296.0 10196.0
augTM,transMap 17341.0 6828.0 6568.0 11563.0 10821.0
augTMR 2881.0 6550.0 5952.0 4217.0 5399.0
transMap 39284.0 44285.0 46113.0 39930.0 41300.0