Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 从存储在可变长度连接字符串中的数据恢复数据帧_R - Fatal编程技术网

R 从存储在可变长度连接字符串中的数据恢复数据帧

R 从存储在可变长度连接字符串中的数据恢复数据帧,r,R,我有一个数据框,它包含了一系列的特性,与由|分隔的id相对应: df = data.frame(id = c("1","2","3"), features = c("1|2|3","4|5","6|7") ) df 我的目标是为每个特性设置一个列,并为id设置一个其存在的指示器,例如 id | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 3 | 0 | 0 |

我有一个数据框,它包含了一系列的特性,与由|分隔的id相对应:

df = data.frame(id = c("1","2","3"), 
features = c("1|2|3","4|5","6|7")
)
df
我的目标是为每个特性设置一个列,并为id设置一个其存在的指示器,例如

id | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
1 | 1 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 1 |

这些特性存储在另一个表中,因此可能的特性的完整列表是可用的,但是如果我可以动态生成它,那就更好了

我的第一次尝试是使用一个非常慢的for循环和grepl()填充一个预先创建的矩阵'm'例如

  for (i in 1:dim(df)[1]){
  print(i)
  if(grepl("1\\|", df$feature[i])) {m[i,1] <- 1}
  if(grepl("2\\|", df$feature[i])) {m[i,2] <- 1}
  if(grepl("3\\|", df$feature[i])) {m[i,3] <- 1}
  if(grepl("4\\|", df$feature[i])) {m[i,4] <- 1}
  if(grepl("5\\|", df$feature[i])) {m[i,5] <- 1}
  if(grepl("6\\|", df$feature[i])) {m[i,6] <- 1}
  if(grepl("7\\|", df$feature[i])) {m[i,7] <- 1}
}
for(1中的i:dim(df)[1]){
印刷品(一)

如果(grepl(“1\\|”,df$feature[i]){m[i,1]返回的最自然的对象是矩阵

# split features column by pipe symbol  and subset result, dropping pipes
temp <- lapply(strsplit(as.character(df$features), split="|"), function(i) i[i != "|"])
# use %in% to return logical vector of desired length, convert to integer and rbind list
myMat <- do.call(rbind, lapply(temp, function(i) as.integer(1:7 %in% i)))
# add id as row names 
rownames(myMat) <- df$id
如果确实需要data.frame,可以使用

temp <- lapply(strsplit(as.character(df$features), split="|"), function(i) i[i != "|"])
myDf <- cbind(id=df$id, data.frame(do.call(rbind,
                                          lapply(temp, function(i) as.integer(1:7 %in% i)))))

@Imo解决方案的不同实现,使用
stringr
dplyr
实现更紧凑的表示法,并适用于从1到任意数字的功能:

# Split the feature column
temp <- str_split(df$features, "\\|") 

# Find the maximum feature
maximum <- as.numeric(max(do.call(rbind, temp), na.rm = T))

# Crate the final data frame
lapply(temp, function(i) as.integer(1:maximum %in% i)) %>%
    do.call(rbind,.) %>% 
    as.data.frame() %>%
    cbind(df, .)

感谢lmo和该模块最终合并了两个答案,因为实际特征范围为1…300,解决方案将特征字符串拆分为单个整数,即24变为“2”,“4”。
myDf
  df$id X1 X2 X3 X4 X5 X6 X7
1     1  1  1  1  0  0  0  0
2     2  0  0  0  1  1  0  0
3     3  0  0  0  0  0  1  1
# Split the feature column
temp <- str_split(df$features, "\\|") 

# Find the maximum feature
maximum <- as.numeric(max(do.call(rbind, temp), na.rm = T))

# Crate the final data frame
lapply(temp, function(i) as.integer(1:maximum %in% i)) %>%
    do.call(rbind,.) %>% 
    as.data.frame() %>%
    cbind(df, .)
  id features V1 V2 V3 V4 V5 V6 V7
1  1    1|2|3  1  1  1  0  0  0  0
2  2      4|5  0  0  0  1  1  0  0
3  3      6|7  0  0  0  0  0  1  1