在person'之后添加逗号;在数据帧中使用R
如何在字符串中的用户名后添加逗号,这样我就可以消除逗号前的单词,以便获得统一的字符串,以便进行精确匹配在person'之后添加逗号;在数据帧中使用R,r,R,如何在字符串中的用户名后添加逗号,这样我就可以消除逗号前的单词,以便获得统一的字符串,以便进行精确匹配 a=dataframe(text=c("hi john what are you doing", "hi sunil what are you doing", "hello sanjay what are you doing"),stringsAsFactors =FALSE) 解决这个问题有两个办法 首先,如果你能得到一个用
a=dataframe(text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing"),stringsAsFactors =FALSE)
解决这个问题有两个办法 首先,如果你能得到一个用户名列表
usernames <- c("john", "sunil", "sanjay")
diag(sapply(usernames, function(x) gsub(x, paste0(x, ","), a$text)))
# [1] "hi john, what are you doing" "hi sunil, what are you doing" "hello sanjay, what are you doing"
数据
a如果你知道用户名在句子中的第二个位置,你可以从DF中提取句子并使用:
text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing")
for (sentence in text) {
#separate words in sentence
spl <- strsplit(sentence," ")
#extract name and converto to uppercase
name <- toupper(as.character(spl[[1]])[2])
#put a comma after name
name2 <- paste(name, ",", sep="")
#replace original name with new one
spl[[1]][2] <- name2
#loop over the sentence words to recretae the sentence
for ( i in 1:length(spl[[1]])-1 ) {
if (i == 1) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
else sentence2 <- paste(sentence2, spl[[1]][i+1])
}
#put in new list (text2)
if (sentence == text[1]) text2 <- c(sentence2)
else text2 <- append( text2, sentence2 )
}
然后重新创建数据帧
否则,如果您的用户名在句子中的位置可能不同,但您有一个需要查找的用户名列表,您还可以检查是否至少找到一个,取用户名在句子中的位置,替换,放置逗号,然后重新创建,如果找不到,则打印错误
usernames <- c("john", "sunil", "sanjay")
text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing",
"hello ciao how are you"
)
for (sentence in text) {
user_present <- NA
#separate words in sentence
spl <- strsplit(sentence," ")
#check if a user is present in the sentence
for (user in usernames) {
if ( user %in% spl[[1]]) {
user_present <- user
break
}}
#if at least one user is found
if ( !is.na(user_present) ) {
pos <- which( spl[[1]] == user_present )
#extract name and converto to uppercase
name <- toupper(as.character(spl[[1]])[pos])
#put a comma after name
name2 <- paste(name, ",", sep="")
#replace original name with new one
spl[[1]][2] <- name2
#loop over the sentence words to recretae the sentence
for ( i in 1:length(spl[[1]])-1 ) {
if (i == 0) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
else sentence2 <- paste(sentence2, spl[[1]][i+1])
}
#put in new list (text2)
if (sentence == text[1]) text2 <- c(sentence2)
else text2 <- append( text2, sentence2 )
#if NO username in sentence
} else {
#print error message with username and sentence in which not found
err.msg <- paste("NO username found in sentence: ", sentence)
print(err.msg)
}
}
希望有帮助
###END
你们有名字列表或名字向量吗?问题是,输入文件是小写的…所以很难区分名字。此外,建议是否有一种方法将用户名转换为大写,以便我们以后可以删除它们。您需要一个特定的模式,名称必须属于该模式。否则这将不可能做到//编辑:如果所有条目都是这样构造的,您可以使用第二个单词作为用户名的参考。嗨,akrun,我没有名称列表,因为它是一个大文件。如果没有模式,那么在粘贴时会变得很困难(sentence2,spl[[1]][I+1]):第一个代码找不到对象“sentence2”(如果usename是第二个)
#text2
#[1] "hi JOHN, what are you doing" "hi SUNIL, what are you doing"
#[3] "hello SANJAY, what are you doing"
usernames <- c("john", "sunil", "sanjay")
text=c("hi john what are you doing",
"hi sunil what are you doing",
"hello sanjay what are you doing",
"hello ciao how are you"
)
for (sentence in text) {
user_present <- NA
#separate words in sentence
spl <- strsplit(sentence," ")
#check if a user is present in the sentence
for (user in usernames) {
if ( user %in% spl[[1]]) {
user_present <- user
break
}}
#if at least one user is found
if ( !is.na(user_present) ) {
pos <- which( spl[[1]] == user_present )
#extract name and converto to uppercase
name <- toupper(as.character(spl[[1]])[pos])
#put a comma after name
name2 <- paste(name, ",", sep="")
#replace original name with new one
spl[[1]][2] <- name2
#loop over the sentence words to recretae the sentence
for ( i in 1:length(spl[[1]])-1 ) {
if (i == 0) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
else sentence2 <- paste(sentence2, spl[[1]][i+1])
}
#put in new list (text2)
if (sentence == text[1]) text2 <- c(sentence2)
else text2 <- append( text2, sentence2 )
#if NO username in sentence
} else {
#print error message with username and sentence in which not found
err.msg <- paste("NO username found in sentence: ", sentence)
print(err.msg)
}
}
#[1] "NO username found in sentence: hello ciao how are you"
text2
#[1] " hi JOHN, what are you doing" " hi SUNIL, what are you doing"
#[3] " hello SANJAY, what are you doing"
###END