R 通过引用现有列中的字符串，将值分配给data.table中的新列_R_Data.table

R 通过引用现有列中的字符串，将值分配给data.table中的新列

R 通过引用现有列中的字符串，将值分配给data.table中的新列,r,data.table,R,Data.table,我有一个变量表，描述了大学学生的课程。它已经是data.table格式其中一列SCA_TITLE包含课程名称。其中包含名称为“信息系统学士”、“法律学士和信息系统学士”的字符串我想创建一个名为“双学位”的新列，在学生攻读双学位的地方指定1，在学生没有攻读双学位的地方指定0 因此，基本上，SCA_标题满足以下条件之一有字符串“和单身汉”，或单身汉这个词重复了两次新列中的值需要设置为1，如果不是，则需要设置为0 如蒙协助，将不胜感激 SCA_标题列如下所示。共有465K个观察值和65个

我有一个变量表，描述了大学学生的课程。它已经是data.table格式

其中一列SCA_TITLE包含课程名称。其中包含名称为“信息系统学士”、“法律学士和信息系统学士”的字符串

我想创建一个名为“双学位”的新列，在学生攻读双学位的地方指定1，在学生没有攻读双学位的地方指定0

因此，基本上，SCA_标题满足以下条件之一

有字符串“和单身汉”，或
单身汉这个词重复了两次

新列中的值需要设置为1，如果不是，则需要设置为0

如蒙协助，将不胜感激

SCA_标题列如下所示。共有465K个观察值和65个变量：

204：理学学士荣誉学位（环境管理） 205：理学学士荣誉学位（医学生物科学） 206：理学学士荣誉学位（科学学者计划） 207：视觉艺术荣誉学士学位 208：视觉传达荣誉学士学位

你可以试试

library(data.table)
setDT(df)[,DOUBLE_DEGREE:=as.numeric(grepl('and Bachelor',
                                        SCA_TITLE)|.N>1),by=ID]
df 
df
 #    ID                       SCA_TITLE DOUBLE_DEGREE
 # 1:  3                   Bachelor of A             0
 # 2:  2                   Bachelor of B             1
 # 3:  5                   Bachelor of C             1
 # 4:  4                   Bachelor of D             0
 # 5:  5                   Bachelor of E             1
 # 6:  7                   Bachelor of F             0
 # 7:  2 Bachelor of G and Bachelor of N             1
 # 8:  6                   Bachelor of H             1
 # 9:  6                   Bachelor of I             1
 #10:  2                   Bachelor of J             1

更新

如有其他<代码>度<代码>，只需考虑<代码> > <代码> > <代码> > < <代码> < /P>

  setDT(df1)[, DOUBLE_DEGREE:= as.numeric(sum(grepl('Bachelor',
             SCA_TITLE))>1|grepl('and Bachelor', SCA_TITLE)), by=ID]

 df1
 #    ID                                         SCA_TITLE DOUBLE_DEGREE
 #1:  3                   Honours degree of Bachelor of A             0
 #2:  2                   Honours degree of Bachelor of B             1
 #3:  5                   Honours degree of Bachelor of C             1
 #4:  4                   Honours degree of Bachelor of D             0
 #5:  5                  Honours  degree of Bachelor of E             1
 #6:  7                   Honours degree of Bachelor of F             0
 #7:  9 Honours degree of Bachelor of G and Bachelor of N             1
 #8:  6                                       Some degree             0
 #9:  6                   Honours degree of Bachelor of I             0
 #10: 2                   Honours degree of Bachelor of J             1

数据

<代码> DF请考虑显示数据集的几行

df <- structure(list(ID = c(3L, 2L, 5L, 4L, 5L, 7L, 2L, 6L, 6L, 2L), 
SCA_TITLE = c("Bachelor of A", "Bachelor of B", "Bachelor of C", 
"Bachelor of D", "Bachelor of E", "Bachelor of F", 
"Bachelor of G and Bachelor of N",     "Bachelor of H", "Bachelor of I",
"Bachelor of J")), .Names = c("ID", "SCA_TITLE"), row.names = c(NA, -10L),
class = "data.frame")

df1 <-  structure(list(ID = c(3, 2, 5, 4, 5, 7, 9, 6, 6, 2), SCA_TITLE =
c("Honours degree of Bachelor of A", "Honours degree of Bachelor of B",
"Honours degree of Bachelor of C", "Honours degree of Bachelor of D", 
"Honours  degree of Bachelor of E", "Honours degree of Bachelor of F", 
"Honours degree of Bachelor of G and Bachelor of N", "Some degree", 
"Honours degree of Bachelor of I", "Honours degree of Bachelor of J"
 )), .Names = c("ID", "SCA_TITLE"), row.names = c(NA, -10L),
 class = "data.frame")