R：通过计算另一个数据帧中CSV列中字符串的出现次数，将计数出现列添加到数据帧_R_Count

R：通过计算另一个数据帧中CSV列中字符串的出现次数，将计数出现列添加到数据帧

R：通过计算另一个数据帧中CSV列中字符串的出现次数，将计数出现列添加到数据帧,r,count,R,Count,我有一个数据帧df1： df1 <- structure(list(Id = c(0, 1, 3, 4), Support = c(17, 15, 10, 18 ), Genes = structure(c(3L, 1L, 4L, 2L), .Label = c("BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1", "CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4", "FOS,BCL2,PIK3CD,NFKBIA

我有一个数据帧

df1

：

df1 <- structure(list(Id = c(0, 1, 3, 4), Support = c(17, 15, 10, 18
), Genes = structure(c(3L, 1L, 4L, 2L), .Label = c("BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1", 
"CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4", "FOS,BCL2,PIK3CD,NFKBIA,TNFRSF10B", 
"MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4,PIK3CD"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

如何通过计算

Genes

列中

df2

中每个字符串的出现次数，在

df1

中创建一个新列，以获得所需的输出

    Id    |    Support    |     Genes    |    Counts    |
---------------------------------------------------------
    0     |       17      |FOS,BCL2,...  |      2       |
    1     |       15      |BMP2,TFGB1,...|      3       |
    3     |       10      |MAPK12,YWHAE..|      1       |
    4     |       18      |CBLC,TGFA,... |      4       |

可能有一个更优雅的解决方案，但这确实起到了作用

df$Counts <- unlist(lapply(df$Genes, function(x){
  xx <- unlist(strsplit(as.character(x),split = ","))
  sum(df2$V1 %in% xx)
}))

（我认为在您上面的示例中，第三行的

计数应该是2
而不是1
？）
这里是使用stringr库的另一个选项。这将循环来自df的Genes列，并使用df2数据帧作为模式
#convert factors columns into characters
df$Genes<-as.character(df$Genes)
df2$V1<-as.character(df2$V1)

library(stringr)
#loop over the strings against the pattern from df2
df$Counts<-sapply(df$Genes, function(x){
  sum(str_count(x, df2$V1))
})



df
  Id Support                                      Genes Counts
1  0      17           FOS,BCL2,PIK3CD,NFKBIA,TNFRSF10B      2
2  1      15 BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1      3
3  3      10     MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4,PIK3CD      2
4  4      18  CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4      4

#将因子列转换为字符
df$GenesYou是正确的，这是我的错别字，谢谢你的回答！
 Id Support                                      Genes Counts
1  0      17           FOS,BCL2,PIK3CD,NFKBIA,TNFRSF10B      2
2  1      15 BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1      3
3  3      10     MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4,PIK3CD      2
4  4      18  CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4      4

#convert factors columns into characters
df$Genes<-as.character(df$Genes)
df2$V1<-as.character(df2$V1)

library(stringr)
#loop over the strings against the pattern from df2
df$Counts<-sapply(df$Genes, function(x){
  sum(str_count(x, df2$V1))
})



df
  Id Support                                      Genes Counts
1  0      17           FOS,BCL2,PIK3CD,NFKBIA,TNFRSF10B      2
2  1      15 BMP2,TGFB1,BMP3,MAPK12,GDF11,MAPK13,CITED1      3
3  3      10     MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4,PIK3CD      2
4  4      18  CBLC,TGFA,MAPK12,YWHAE,YWHAQ,MAPK13,SPRY4      4