Regex 同一数据帧中的句子检测与提取_Regex_R

Regex 同一数据帧中的句子检测与提取

regex r

Regex 同一数据帧中的句子检测与提取,regex,r,Regex,R,我有以下数据框： reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product", "Inexpensive. An improvement over integrated graphics.",

我有以下数据框：

reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product",
                               "Inexpensive. An improvement over integrated graphics.",
                               "I love that product so excite. I will order again if I need more .",
                               "Excellent card, great graphics."),
                      user = c(1,2,3,4),
                      Review_Id = c("101968","101968","210546","112546"), 
                      stringsAsFactors = FALSE)

我想知道这样的事情：

sent\u detect（reviews$value）

但我如何组合该函数以获得所需的输出呢

如果您的数据真的很整洁，您可以使用我的“splitstackshape”软件包中的

cSplit

你的数据真的这么干净吗？（例如，是否所有句子都以句号结尾并后跟空格？）。。。这个功能真是太棒了。它解决了我的任务。再次非常感谢。最后一个问题。。。如果我有句末不只是。但是，例如！或者？，那么我如何将其添加到sSplit函数中？@martinkabe，您可以尝试类似于

cSplit（评论，“value”、“[.！？]”、fixed=FALSE、stripWhite=FALSE、direction=“long”）

的方法进行拆分。“！”和“？”。

        user     review_Id                                 sentence
           1        101968        Made with high quality materials.
           1        101968                        Very Good product
           2        101968                             Inexpensive.
           2        101968 An improvement over integrated graphics.
           3        210546           I love that product so excite.
           3        210546      I will order again if I need more .
           4        112546          Excellent card, great graphics.

library(splitstackshape)
cSplit(reviews, "value", ".", direction = "long")
#                                          value user Review_Id
# 1: Product was received in excellent condition    1    101968
# 2:            Made with high quality materials    1    101968
# 3:                           Very Good product    1    101968
# 4:                                 Inexpensive    2    101968
# 5:     An improvement over integrated graphics    2    101968
# 6:               I love that product so excite    3    210546
# 7:           I will order again if I need more    3    210546
# 8:              Excellent card, great graphics    4    112546