R:提取的字符串能否作为分隔字符保存到一列中?
假设我需要根据评论行中的句子为人们分配课程。(实际数据比这更复杂,我简化了) 因此,我使用带有remathces()、gsub()和gregexpr()的正则表达式从数据的注释语句中提取字符串。然后将列表保存到列中,并将它们组合为字符,如下所示R:提取的字符串能否作为分隔字符保存到一列中?,r,string,extract,R,String,Extract,假设我需要根据评论行中的句子为人们分配课程。(实际数据比这更复杂,我简化了) 因此,我使用带有remathces()、gsub()和gregexpr()的正则表达式从数据的注释语句中提取字符串。然后将列表保存到列中,并将它们组合为字符,如下所示 >cbind.data.frame(level,software,month,stringsAsFactors = FALSE) level software mon
>cbind.data.frame(level,software,month,stringsAsFactors = FALSE)
level software month
1 c("beginner1","beginner2") c++ Dec
2 NA Java Jan
3 "beginner3" NA May
4 "intermediate2" NA NA
5 NA Matlab Mar
6 "advanced1" c("java","c++") Jul
我想用
-将列表c(“初学者1”、“初学者2”)拆分为“初学者1”、“初学者2”
-滴NA
-保持如下特征
newcol
"beginner1","beginner2","c++","Dec"
"Java","Jan"
"beginner3", "May"
"intermediate2"
"Matlab", "Mar"
"advanced1","java","c++","Jul"
然而,当我组合时,它被组合成一个角色
> newcol<-unite(combined, newcol, 1:ncol(combined), remove=TRUE, sep = ",")
"beginner1,beginner2,c++,Dec"
"Java,Jan"
"beginner3, May"
"intermediate2"
"Matlab, Mar"
"advanced1,java,c++,Jul"
这有帮助吗
A<-data.frame(a=c("a","b","c"),b=c("a","b","c"),c=c("a","b","c"))
apply(A,2,paste,collapse=",")
A这是一个使用
f <- Vectorize(function(u) {
z <- unlist(regmatches(u,gregexpr('\".*?\"',u,perl = T)))
if (length(z)> 0) {
r <- gsub('\"',"",z)
} else {
r <- u
}
r
})
df$newcol <- apply(df,1,function(x) f(na.omit(x)))
数据
df <- structure(list(level = c("c(\"beginner1\",\"beginner2\")", NA,
"beginner3", "intermediate2", NA, "advanced1"), software = c("c++",
"Java", NA, NA, "Matlab", "c(\"java\",\"c++\")"), month = c("Dec",
"Jan", "May", NA, "Mar", "Jul")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
<代码> DF Apple创建3列,但我想把它们放在一列中…<代码> AThis是惊人的,非常感谢,但是> STR(DF$NeCoCL)CHR[1:6]“开始NELNE1,NEXNEL2,C++,DEC”…因此,每一行仍然被读取为一个字符串,而不是“初学者1”、“初学者2”、“c++”、“Dec”。因此,分隔字符不能存储在列中?@rocknRrr您可以dput()
您的数据吗?如果可以存储在列中,我可以再试一次。我希望我可以共享我的数据,但这是保密的。。。但我的数据与您创建的数据结构相同,df。到目前为止,我的理解是,不可能将多个逗号分隔的字符存储到一个变量(或单元格)。。。。。非常感谢你@rocknRrr我认为在一个细胞里储存东西是可能的。请看我的update@rocknRrr我发现apply
中的函数不需要使用as.list
,这将代码简化了一点。请看我的更新
> df$newcol
$`1`
$`1`$level
[1] "beginner1" "beginner2"
$`1`$software
[1] "c++"
$`1`$month
[1] "Dec"
$`2`
$`2`$software
[1] "Java"
$`2`$month
[1] "Jan"
$`3`
$`3`$level
[1] "beginner3"
$`3`$month
[1] "May"
$`4`
$`4`$level
[1] "intermediate2"
$`5`
$`5`$software
[1] "Matlab"
$`5`$month
[1] "Mar"
$`6`
$`6`$level
[1] "advanced1"
$`6`$software
[1] "java" "c++"
$`6`$month
[1] "Jul"
df <- structure(list(level = c("c(\"beginner1\",\"beginner2\")", NA,
"beginner3", "intermediate2", NA, "advanced1"), software = c("c++",
"Java", NA, NA, "Matlab", "c(\"java\",\"c++\")"), month = c("Dec",
"Jan", "May", NA, "Mar", "Jul")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))