R中的筛选/子集设置应用于多个列_R_Filter_Dplyr_Subset

R中的筛选/子集设置应用于多个列

r filter

R中的筛选/子集设置应用于多个列,r,filter,dplyr,subset,R,Filter,Dplyr,Subset,我在上面附上了我的数据集样本。我想在R中的多个列中进行筛选，以将包含例如123或321的数据集子集到目前为止，我尝试使用dplyr- Index odx1 odx2 odx3 odx4 odx5 1 123 0 0 0 0 2 0 321 0 0 0 3 0 0 0 123 0 4 0

我在上面附上了我的数据集样本。我想在R中的多个列中进行筛选，以将包含例如123或321的数据集子集

到目前为止，我尝试使用dplyr-

Index   odx1    odx2    odx3    odx4    odx5
1       123     0       0       0       0
2       0       321     0       0       0
3       0       0       0       123     0
4       0       321     0       0       0
5       0       0       0       0       0

尽管上述方法可行，但是否有更简洁的方法

我的实际数据集包含odx1-odx25，我有一个大约15个字符串的列表，要在大约100K行中进行筛选

编辑：

实际上，数据集包含随机的数字字符串，但为了便于查看和简化，我只使用了0作为示例

df %>% filter(., odx1==123 | odx2==123 | odx3==123 | odx4==123 | odx5==123 | odx1==321| odx2==321| odx3==321| odx4==321| odx5==321)

正如我在评论中所说：

如果数据总是采用这种通用格式（只是想去掉由所有0组成的观测值，那么更快一点（在击键和计算时间方面）的解决方案是：

Index   odx1    odx2    odx3    odx4    odx5
1       123     421     532     414     981
2       243     321     765     132     321
3       144     322     587     123     444
4       655     321     459     091     676
5       456     421     523     431     768

或者，如果您需要筛选一组显式的值（您说您有15个字符串要筛选），您可以使用它筛选所有列

df[rowSums(df[, -1]!=0)!=0,]

库（dplyr）
conditions.to.match%筛选器（Reduce（“|”，lappy（df，“%in%”，conditions.to.match）））

（）

基本包：

library(dplyr)
conditions.to.match <- c(123, 321)
df %>% filter(Reduce('|', lapply(df, '%in%', conditions.to.match)))

dplyr

package

df[apply(df, 1, function(x) {any(x == 123 | x == 321)}),]

输出：

library(dplyr)
filter(df, rowSums(mutate_each(df, funs(. %in% c(123, 321)))) >= 1L)

df[rowsumes（df==123 | df==321）>0，]

如果数据总是采用这种通用格式（只想去掉由所有

s组成的观察值，那么更快一点（在击键和计算时间方面）的解决方案是：

df[rowsumes（df！=0）！=0，]

您需要排除索引列，因此行和中的df[，-1]？另外，如果速度是关键，

system.time（df[rowSums（df[，-1]）！=0，]；user system appead 2.744 0.798 3.894 system.time（df[rowSums（df！=0）！=0，]）user system appead 5.086 1.617 6.939

您是对的。在我的头部，索引是row.name（因为有时人们会意外地包含它们），在执行上述代码之前，我已经删除了它。我编辑了我的代码。谢谢

  Index odx1 odx2 odx3 odx4 odx5
1     1  123    0    0    0    0
2     2    0  321    0    0    0
3     3    0    0    0  123    0
4     4    0  321    0    0    0