R 基于colSums的不正确子集设置

R 基于colSums的不正确子集设置,r,subset,R,Subset,我有一个大数据帧(100 x 1748),下面是精简版9x10: Fraction Treatment Time Replicate A B C D E F LL10.5AT T LL 10.5 A 11.11428 11.82154 10.445625 8.849699 10.373386 9.109676

我有一个大数据帧(100 x 1748),下面是精简版9x10:

                Fraction Treatment Time Replicate        A        B         C        D         E         F
LL10.5AT           T        LL      10.5    A        11.11428 11.82154 10.445625 8.849699 10.373386  9.109676
LL10.5BT           T        LL      10.5    B        12.17890 11.01224 11.720548 9.405390 10.206708 10.653205
LL10.5CT           T        LL      10.5    C        10.80697 11.19782 11.175291 8.305949  9.696153  8.791403
OL10.5AT           T        OL      10.5    A        10.46481 10.81123  9.975277 7.783538  9.784773  8.640531
OL10.5BT           T        OL      10.5    B        10.75621 10.76371 10.625745 7.592059  9.820686  8.760861
OL10.5CT           T        OL      10.5    C        12.00054 11.02080 11.615536 8.903105  9.963635 10.547791
HL10.5AT           T        HL      10.5    A        10.87092 11.45102 10.780183 6.422136 10.424391  9.489396
HL10.5BT           T        HL      10.5    B        12.12334 11.29960 11.541679 9.774041  9.563639 10.532936
HL10.5CT           T        HL      10.5    C 10.21460 10.64746  9.886603 7.834040  9.828347  8.261546
我想将其子集,使其只包含sum>100的列。我使用以下代码

dt.sub <- dt[,colSums(dt[,5:ncol(dt)]) > 100]
谢谢你的帮助,
Kasia

您对列的索引不正确

df[, c(1:4, which(colSums(df[, 5:ncol(df)]) > 100) + 4)]
#         Fraction Treatment Time Replicate        A        B
#LL10.5AT     TRUE        LL 10.5         A 11.11428 11.82154
#LL10.5BT     TRUE        LL 10.5         B 12.17890 11.01224
#LL10.5CT     TRUE        LL 10.5         C 10.80697 11.19782
#OL10.5AT     TRUE        OL 10.5         A 10.46481 10.81123
#OL10.5BT     TRUE        OL 10.5         B 10.75621 10.76371
#OL10.5CT     TRUE        OL 10.5         C 12.00054 11.02080
#HL10.5AT     TRUE        HL 10.5         A 10.87092 11.45102
#HL10.5BT     TRUE        HL 10.5         B 12.12334 11.29960
#HL10.5CT     TRUE        HL 10.5         C 10.21460 10.64746
说明:
which(colSums(df[,5:ncol(df)])>100)
返回
df[,5:ncol(df)]
内的索引(不在
df
!),其中列和为
>100
;然后我们添加4(因为我们从索引5开始),并包括第1列到第4列,以获得我们想要保留的
df
中列的索引


样本数据
#df
> tail(z)
    501         502         503         504         505
107.9368630  90.6337275   0.8724593   0.8724593   1.3497445   1.3497445 
df[, c(1:4, which(colSums(df[, 5:ncol(df)]) > 100) + 4)]
#         Fraction Treatment Time Replicate        A        B
#LL10.5AT     TRUE        LL 10.5         A 11.11428 11.82154
#LL10.5BT     TRUE        LL 10.5         B 12.17890 11.01224
#LL10.5CT     TRUE        LL 10.5         C 10.80697 11.19782
#OL10.5AT     TRUE        OL 10.5         A 10.46481 10.81123
#OL10.5BT     TRUE        OL 10.5         B 10.75621 10.76371
#OL10.5CT     TRUE        OL 10.5         C 12.00054 11.02080
#HL10.5AT     TRUE        HL 10.5         A 10.87092 11.45102
#HL10.5BT     TRUE        HL 10.5         B 12.12334 11.29960
#HL10.5CT     TRUE        HL 10.5         C 10.21460 10.64746
#df <- read.table(text =
#    "                Fraction Treatment Time Replicate        A        B         C        D         E         F
#LL10.5AT           T        LL      10.5    A        11.11428 11.82154 10.445625 8.849699 10.373386  9.109676
#LL10.5BT           T        LL      10.5    B        12.17890 11.01224 11.720548 9.405390 10.206708 10.653205
#LL10.5CT           T        LL      10.5    C        10.80697 11.19782 11.175291 8.305949  9.696153  8.791403
#OL10.5AT           T        OL      10.5    A        10.46481 10.81123  9.975277 7.783538  9.784773  8.640531
#OL10.5BT           T        OL      10.5    B        10.75621 10.76371 10.625745 7.592059  9.820686  8.760861
#OL10.5CT           T        OL      10.5    C        12.00054 11.02080 11.615536 8.903105  9.963635 10.547791
#HL10.5AT           T        HL      10.5    A        10.87092 11.45102 10.780183 6.422136 10.424391  9.489396
#HL10.5BT           T        HL      10.5    B        12.12334 11.29960 11.541679 9.774041  9.563639 10.532936
#HL10.5CT           T        HL      10.5    C 10.21460 10.64746  9.886603 7.834040  9.828347  8.261546", header = T)