在data.frame中查找字符串以填充新列

在data.frame中查找字符串以填充新列,r,dataframe,dplyr,R,Dataframe,Dplyr,我在数据上使用dplyr创建了如下数据子集: dd <- data.frame(ID = c(700689L, 712607L, 712946L, 735907L, 735908L, 735910L, 735911L, 735912L, 735913L, 746929L, 747540L), `1` = c("eg", NA, NA, "eg", "eg", NA, NA, NA, NA, "eg", NA), `2` = c(NA, NA, NA, "sk", "lk", NA, N

我在数据上使用dplyr创建了如下数据子集:

dd <- data.frame(ID = c(700689L, 712607L, 712946L, 735907L, 735908L, 735910L, 735911L, 735912L, 735913L, 746929L, 747540L), 
`1` = c("eg", NA, NA, "eg", "eg", NA, NA, NA, NA, "eg", NA), 
`2` = c(NA, NA, NA, "sk", "lk", NA, NA, NA, NA, "eg", NA), 
`3` = c(NA, NA, NA, "sk", "lk", NA, NA, NA, NA, NA, NA), 
`4` = c(NA, NA, NA, "lk", "lk", NA, NA, NA, NA, NA, NA), 
`5` = c(NA, NA, NA, "lk", "lk", NA, NA, NA, NA, NA, NA), 
`6` = c(NA, NA, NA, "lk", "lk", NA, NA, NA, NA, NA, NA))

根据您的描述,您希望一列检查eg,另一列检查lk和sk。如果是这种情况,那么下面的base R方法将起作用

dfNew <- cbind(id=dd[1],
               eg=pmin(rowSums(dd[-1] == "eg", na.rm=TRUE), 1),
               other=pmin(rowSums(dd[-1] == "sk" | dd[-1] == "lk", na.rm=TRUE), 1))

这里有一个公认的黑客dplyr/purrr解决方案。考虑到你的ID似乎永远不会等于“eg”、“sk”或“lk”,我没有包含任何不搜索ID列的内容

library(dplyr)
library(purrr)
dd %>% 
    split(.$ID) %>% 
    map_df(~ data_frame(
        ID = .x$ID, 
        eg = ifelse(any(.x == 'eg', na.rm = TRUE), 1, 0), 
        other = ifelse(any(.x == 'lk' | .x == 'sk', na.rm = TRUE), 1, 0)
    ))
也许更简单:x=dd[-1]='eg';cbinddd[1],1*!!rowSumsx,na.rm=T,1*!!划船!x、 na.rm=TNice。使用1*!!要将整数转换为二进制0,1非常酷。或者+!!但它不是很明确。
dfNew
       ID eg other
1  700689  1     0
2  712607  0     0
3  712946  0     0
4  735907  1     1
5  735908  1     1
6  735910  0     0
7  735911  0     0
8  735912  0     0
9  735913  0     0
10 746929  1     0
11 747540  0     0
library(dplyr)
library(purrr)
dd %>% 
    split(.$ID) %>% 
    map_df(~ data_frame(
        ID = .x$ID, 
        eg = ifelse(any(.x == 'eg', na.rm = TRUE), 1, 0), 
        other = ifelse(any(.x == 'lk' | .x == 'sk', na.rm = TRUE), 1, 0)
    ))