R 从一个字符串变量创建多个虚拟变量_R_String_Dataframe_Split_Splitstackshape

R 从一个字符串变量创建多个虚拟变量

r string dataframe

R 从一个字符串变量创建多个虚拟变量,r,string,dataframe,split,splitstackshape,R,String,Dataframe,Split,Splitstackshape,我已经尝试了几乎所有的方法，但我无法得到其他人似乎都能得到的结果。这是我的问题：我有一个这样的数据框，列出了每位老师的成绩： > profs <- data.frame(teaches = c("1st", "1st, 2nd", "2nd, 3rd", "1st, 2nd, 3rd")) > profs teaches 1

我已经尝试了几乎所有的方法，但我无法得到其他人似乎都能得到的结果。这是我的问题：

我有一个这样的数据框，列出了每位老师的成绩：

> profs <- data.frame(teaches = c("1st", "1st, 2nd",
                                  "2nd, 3rd",
                                  "1st, 2nd, 3rd"))
> profs
        teaches
1           1st
2      1st, 2nd
3      2nd, 3rd
4 1st, 2nd, 3rd

根据回答者的解释，涉及到

splitstackshape

库和显然不推荐使用的

concat.split.expanded

函数应该完全满足我的要求。然而，我似乎无法达到同样的结果：

> concat.split.expanded(profs, "teaches", fill = 0, drop = TRUE)
Fehler in seq.default(min(vec), max(vec)) : 
  'from' cannot be NA, NaN or infinite

使用

cSplit

，我理解它取代了“大多数早期的concat.split*函数”，我得到了以下结果：

> cSplit(profs, "teaches")
   teaches_1 teaches_2 teaches_3
1:       1st        NA        NA
2:       1st       2nd        NA
3:       2nd       3rd        NA
4:       1st       2nd       3rd

我尝试过使用cSplit的帮助并调整其中的每一个参数，但我就是无法得到这个分割。非常感谢您的帮助。

我找到了解决办法。如果字符串变量只包含分隔符和数字，则

concat.split.expanded

似乎有效，例如：

> profs <- data.frame(teaches = c("1", "1, 2", "2, 3", "1, 2, 3"))
> profs
  teaches
1       1
2    1, 2
3    2, 3
4 1, 2, 3

但是，我仍在寻找一种解决方案，它不涉及从我的

变量中删除所有字母。
这是另一种选择：
Vectorize(grepl, 'pattern')(c('1st', '2nd', '3rd'), profs$teaches)
#        1st   2nd   3rd
# [1,]  TRUE FALSE FALSE
# [2,]  TRUE  TRUE FALSE
# [3,] FALSE  TRUE  TRUE
# [4,]  TRUE  TRUE  TRUE

您可以从qdapTools

library(qdapTools)
res <- mtabulate(strsplit(as.character(profs$teaches), ', '))
colnames(res) <- paste0('teaches', colnames(res))
res
#    teaches1st teaches2nd teaches3rd
#1          1          0          0
#2          1          1          0
#3          0          1          1
#4          1          1          1

由于连接的数据是连接的字符串（不是连接的数值），因此需要添加type=“character”
，以使函数按预期工作
函数的默认设置是数值，因此会出现关于NaN
等的错误
命名与同一家族中其他功能的缩写形式更加一致。因此，它现在是cSplit\u e
（尽管旧的函数名仍然有效）
？concat.split.expanded
的帮助页面与cSplit\u e
的帮助页面相同。如果您有任何使其更清晰易懂的提示，请在软件包的GitHub页面上提出问题。
谢谢，我想我太专注于阅读cSplit
了，我忘了阅读concat.split.expanded
或cSplit\e
的帮助。这些帮助文件非常清晰，我只希望在？cSplit上有一个指向这些函数的链接。
Vectorize(grepl, 'pattern')(c('1st', '2nd', '3rd'), profs$teaches)
#        1st   2nd   3rd
# [1,]  TRUE FALSE FALSE
# [2,]  TRUE  TRUE FALSE
# [3,] FALSE  TRUE  TRUE
# [4,]  TRUE  TRUE  TRUE

library(qdapTools)
res <- mtabulate(strsplit(as.character(profs$teaches), ', '))
colnames(res) <- paste0('teaches', colnames(res))
res
#    teaches1st teaches2nd teaches3rd
#1          1          0          0
#2          1          1          0
#3          0          1          1
#4          1          1          1

library(stringi)
(vapply(c('1st', '2nd', '3rd'), stri_detect_fixed, logical(4L), 
                          str=profs$teaches))+0L
#     1st 2nd 3rd
#[1,]   1   0   0
#[2,]   1   1   0
#[3,]   0   1   1
#[4,]   1   1   1

library(splitstackshape)
cSplit_e(profs, "teaches", ",", type = "character", fill = 0)
#         teaches teaches_1st teaches_2nd teaches_3rd
# 1           1st           1           0           0
# 2      1st, 2nd           1           1           0
# 3      2nd, 3rd           0           1           1
# 4 1st, 2nd, 3rd           1           1           1