在data.frame中查找字符串以填充新列
我在数据上使用dplyr创建了如下数据子集:在data.frame中查找字符串以填充新列,r,dataframe,dplyr,R,Dataframe,Dplyr,我在数据上使用dplyr创建了如下数据子集: dd <- data.frame(ID = c(700689L, 712607L, 712946L, 735907L, 735908L, 735910L, 735911L, 735912L, 735913L, 746929L, 747540L), `1` = c("eg", NA, NA, "eg", "eg", NA, NA, NA, NA, "eg", NA), `2` = c(NA, NA, NA, "sk", "lk", NA, N
dd <- data.frame(ID = c(700689L, 712607L, 712946L, 735907L, 735908L, 735910L, 735911L, 735912L, 735913L, 746929L, 747540L),
`1` = c("eg", NA, NA, "eg", "eg", NA, NA, NA, NA, "eg", NA),
`2` = c(NA, NA, NA, "sk", "lk", NA, NA, NA, NA, "eg", NA),
`3` = c(NA, NA, NA, "sk", "lk", NA, NA, NA, NA, NA, NA),
`4` = c(NA, NA, NA, "lk", "lk", NA, NA, NA, NA, NA, NA),
`5` = c(NA, NA, NA, "lk", "lk", NA, NA, NA, NA, NA, NA),
`6` = c(NA, NA, NA, "lk", "lk", NA, NA, NA, NA, NA, NA))
根据您的描述,您希望一列检查eg,另一列检查lk和sk。如果是这种情况,那么下面的base R方法将起作用
dfNew <- cbind(id=dd[1],
eg=pmin(rowSums(dd[-1] == "eg", na.rm=TRUE), 1),
other=pmin(rowSums(dd[-1] == "sk" | dd[-1] == "lk", na.rm=TRUE), 1))
这里有一个公认的黑客dplyr/purrr解决方案。考虑到你的ID似乎永远不会等于“eg”、“sk”或“lk”,我没有包含任何不搜索ID列的内容
library(dplyr)
library(purrr)
dd %>%
split(.$ID) %>%
map_df(~ data_frame(
ID = .x$ID,
eg = ifelse(any(.x == 'eg', na.rm = TRUE), 1, 0),
other = ifelse(any(.x == 'lk' | .x == 'sk', na.rm = TRUE), 1, 0)
))
也许更简单:x=dd[-1]='eg';cbinddd[1],1*!!rowSumsx,na.rm=T,1*!!划船!x、 na.rm=TNice。使用1*!!要将整数转换为二进制0,1非常酷。或者+!!但它不是很明确。
dfNew
ID eg other
1 700689 1 0
2 712607 0 0
3 712946 0 0
4 735907 1 1
5 735908 1 1
6 735910 0 0
7 735911 0 0
8 735912 0 0
9 735913 0 0
10 746929 1 0
11 747540 0 0
library(dplyr)
library(purrr)
dd %>%
split(.$ID) %>%
map_df(~ data_frame(
ID = .x$ID,
eg = ifelse(any(.x == 'eg', na.rm = TRUE), 1, 0),
other = ifelse(any(.x == 'lk' | .x == 'sk', na.rm = TRUE), 1, 0)
))