R中文本数据中两对组合的出现频率
我有一个包含多个字符串(文本)变量的文件,每个回答者都为每个变量写了一两句话。我希望能够找到每个词组合的频率(即“能力”与“性能”的频率)。 到目前为止,我的代码是:R中文本数据中两对组合的出现频率,r,text,combinations,frequency,R,Text,Combinations,Frequency,我有一个包含多个字符串(文本)变量的文件,每个回答者都为每个变量写了一两句话。我希望能够找到每个词组合的频率(即“能力”与“性能”的频率)。 到目前为止,我的代码是: #Setting up the data file data.text <- scan("C:/temp/tester.csv", what="char", sep="\n") #Change everything to lower text data.text <- tolower(data.text) #Sp
#Setting up the data file
data.text <- scan("C:/temp/tester.csv", what="char", sep="\n")
#Change everything to lower text
data.text <- tolower(data.text)
#Split the strings into separate words
data.words.list <- strsplit(data.text, "\\W+", perl=TRUE)
data.words.vector <- unlist(data.words.list)
#List each word and frequency
data.freq.list <- table(data.words.vector)
我不确定这是否是yu的意思,但与其在每两个单词的边界上拆分(我发现尝试和regex很痛苦),不如使用可靠的
头和尾滑动技巧将每两个单词粘贴在一起
# How I read your data
df <- read.table( text = 'ID Reason_for_Dissatisfaction Reason_for_Likelihood_to_Switch
1 "not happy with the service" "better value at other place"
2 "poor customer service" "tired of same old thing"
3 "they are overchanging me" "bad service"
' , h = TRUE , stringsAsFactors = FALSE )
# Split to words
wlist <- sapply( df[,-1] , strsplit , split = "\\W+", perl=TRUE)
# Paste word pairs together
outl <- sapply( wlist , function(x) paste( head(x,-1) , tail(x,-1) , sep = " ") )
# Table as per usual
table(unlist( outl ) )
are overchanging at other bad service better value customer service
1 1 1 1 1
happy with not happy of same old thing other place
1 1 1 1 1
overchanging me poor customer same old the service they are
1 1 1 1 1
tired of value at with the
1 1 1
#我如何读取您的数据
df您的数据.text
看起来像什么?您能提供几行的样本或一些具有代表性的示例数据吗?请参阅此处,了解如何最好地做到这一点的一些技巧:这条线是如何工作的?#将单词对粘贴在一起
# How I read your data
df <- read.table( text = 'ID Reason_for_Dissatisfaction Reason_for_Likelihood_to_Switch
1 "not happy with the service" "better value at other place"
2 "poor customer service" "tired of same old thing"
3 "they are overchanging me" "bad service"
' , h = TRUE , stringsAsFactors = FALSE )
# Split to words
wlist <- sapply( df[,-1] , strsplit , split = "\\W+", perl=TRUE)
# Paste word pairs together
outl <- sapply( wlist , function(x) paste( head(x,-1) , tail(x,-1) , sep = " ") )
# Table as per usual
table(unlist( outl ) )
are overchanging at other bad service better value customer service
1 1 1 1 1
happy with not happy of same old thing other place
1 1 1 1 1
overchanging me poor customer same old the service they are
1 1 1 1 1
tired of value at with the
1 1 1