Regex 同一数据帧中的句子检测与提取
我有以下数据框:Regex 同一数据帧中的句子检测与提取,regex,r,Regex,R,我有以下数据框: reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product", "Inexpensive. An improvement over integrated graphics.",
reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product",
"Inexpensive. An improvement over integrated graphics.",
"I love that product so excite. I will order again if I need more .",
"Excellent card, great graphics."),
user = c(1,2,3,4),
Review_Id = c("101968","101968","210546","112546"),
stringsAsFactors = FALSE)
我想知道这样的事情:sent\u detect(reviews$value)
但我如何组合该函数以获得所需的输出呢 如果您的数据真的很整洁,您可以使用我的“splitstackshape”软件包中的
cSplit
你的数据真的这么干净吗?(例如,是否所有句子都以句号结尾并后跟空格?)。。。这个功能真是太棒了。它解决了我的任务。再次非常感谢。最后一个问题。。。如果我有句末不只是。但是,例如!或者?,那么我如何将其添加到sSplit函数中?@martinkabe,您可以尝试类似于
cSplit(评论,“value”、“[.!?]”、fixed=FALSE、stripWhite=FALSE、direction=“long”)
的方法进行拆分。“!”和“?”。
user review_Id sentence
1 101968 Made with high quality materials.
1 101968 Very Good product
2 101968 Inexpensive.
2 101968 An improvement over integrated graphics.
3 210546 I love that product so excite.
3 210546 I will order again if I need more .
4 112546 Excellent card, great graphics.
library(splitstackshape)
cSplit(reviews, "value", ".", direction = "long")
# value user Review_Id
# 1: Product was received in excellent condition 1 101968
# 2: Made with high quality materials 1 101968
# 3: Very Good product 1 101968
# 4: Inexpensive 2 101968
# 5: An improvement over integrated graphics 2 101968
# 6: I love that product so excite 3 210546
# 7: I will order again if I need more 3 210546
# 8: Excellent card, great graphics 4 112546