R 如何根据字符串区分多个列
我有这样的数据R 如何根据字符串区分多个列,r,R,我有这样的数据 df<- structure(list(`1` = structure(c(3L, 3L, 4L, 3L, 2L, 2L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 3L, 3L, 4L, 4L, 4L, 2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor"), `2` = structure(c(4L, 5L
df<- structure(list(`1` = structure(c(3L, 3L, 4L, 3L, 2L, 2L, 3L,
3L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 3L, 3L, 4L,
4L, 4L, 2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor"),
`2` = structure(c(4L, 5L, 4L, 5L, 4L, 4L, 4L, 5L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 4L, 5L, 3L, 3L, 1L, 4L, 5L, 5L, 5L, 4L,
2L), .Label = c("Het", "Het1-Het2", "Het2", "Homo", "No"), class = "factor"),
`3` = structure(c(3L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 4L, 3L, 3L, 4L,
2L), .Label = c("Het", "Het1-Het2", "Homo", "No"), class = "factor")), class = "data.frame", row.names = c(NA,
-26L))
df我们可以通过table()
函数和按频率排序来实现这一点:
out = data.frame(table(df))
> out[order(out$Freq, decreasing = T), ] # Partial output given
X1 X2 X3 Freq
55 Homo Homo Homo 5
60 No No Homo 5
79 Homo No No 4
9 Het Het2 Het 2
54 Het1-Het2 Homo Homo 2
56 No Homo Homo 2
59 Homo No Homo 2
76 No Homo No 2
1 Het Het Het 1
26 Het1-Het2 Het1-Het2 Het1-Het2 1
2 Het1-Het2 Het Het 0
3 Homo Het Het 0
...
例如,第一行上5的Freq
表示在X1
中观察到Homo
,X2
和X3
中出现了5次
我们可以将第三行中的Freq
解释为4,这意味着存在4种情况,X1
是No
,X2
是No
,X3
是Homo
,使用dplyr
,您可以只过滤您想要的值:
df %>%
filter(`1` == "No",
`2` != "No" & `3` != "No")
1 2 3
1 No Homo Homo
2 No Homo Homo
或
使用计数
进行计数
df %>%
filter(`1` == "No",
`2` != "No" & `3` != "No") %>%
tally()
n
1 2
当然,@Luis的解决方案更简单(在我的书中是首选),只要您修改以满足您的条件(即,&
而不是|
第2列和第3列)。修改是假设我正确阅读了您的请求:
df[df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"),]
1 2 3
9 No Homo Homo
16 No Homo Homo
sum(df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"))
[1] 2
你想过逻辑比较吗?df$1
=“No”&(df$2
!=“No”| df$3
!=“No”)为您提供第一列中不在第二列或第三列中的编号。此外,只是让您知道,以数字开头(或仅由数字组成)命名列不是一种好的做法。而且,在将来,包括你解决问题的尝试总是有帮助的,这样人们就可以用你的代码解决特定的问题。我喜欢你的答案。谢谢我喜欢并接受了你的邀请answer@Learner,太好了,谢谢你。很高兴我能帮忙:)
df[df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"),]
1 2 3
9 No Homo Homo
16 No Homo Homo
sum(df$`1` == "No" & (df$`2` != "No" & df$`3` != "No"))
[1] 2