需要对数据集执行n-gram
我需要在下面的数据集上执行n-gram需要对数据集执行n-gram,r,R,我需要在下面的数据集上执行n-gram 第1列 电子、插座、夹具、冷却器 x[ -length(x) ] #gives you all but the last word # [1] "Electronic" "socket" "clamp" x[-1] #gives you everything except the first word # [1] "socket" "clamp" "cooler" #paste them together with a "_"
第1列
电子、插座、夹具、冷却器
x[ -length(x) ] #gives you all but the last word
# [1] "Electronic" "socket" "clamp"
x[-1] #gives you everything except the first word
# [1] "socket" "clamp" "cooler"
#paste them together with a "_" between yields your desired output for that line
paste0( x[ -length(x) ], '_', x[-1] )
# [1] "Electronic_socket" "socket_clamp" "clamp_cooler"
执行器、气缸、阀门
液压、电子、冷却器
x[ -length(x) ] #gives you all but the last word
# [1] "Electronic" "socket" "clamp"
x[-1] #gives you everything except the first word
# [1] "socket" "clamp" "cooler"
#paste them together with a "_" between yields your desired output for that line
paste0( x[ -length(x) ], '_', x[-1] )
# [1] "Electronic_socket" "socket_clamp" "clamp_cooler"
执行器、气动、气缸、阀门
我需要如下输出
Column 1
Electronic_socket, socket_clamp, clamp_cooler
Actuator_cylinder, cylinder_valve
Hydraulic_electronic, electronic_cooler
Actuator_Pneumatic, Pneumatic_cylinder, cylinder_valve
举一个例子,假设您的数据位于data.frame的列_1中,下面应该使用base R创建您的n-gram:
# build original data as a data.frame with 1 column
df <- data.frame( column_1 =
c("Electronic, socket, clamp, cooler",
"Actuator, cylinder, valve",
"Hydraulic, electronic, cooler",
"Actuator, Pnematic, cylinder, valve"),
stringsAsFactors=FALSE)
lov <- strsplit(df$column_1, ', ', fixed=TRUE)
sapply(lov, function(x){paste0(x[ -length(x) ], '_', x[-1])})
解释此解决方案:
考虑一次只处理一行。
如果向量中已有一行中的单词:
x <- c("Electronic", "socket", "clamp", "cooler")
你可以在这里学习:嗨@Rlearn,下面的解决方案有帮助吗?或者我可以修改一些东西来帮助你改进它吗?嗨,我找到了解决方案。以防我必须执行3克而不是2克。我们可以修改代码吗?如果你问“如何使用BaseR创建三角形?”这可能是一个很好的新问题;-)但是,如果你只是想快速获得n克的结果,请使用@JohnSpring建议的软件包和优秀教程