Regex 在R中搜索句子中的单词_Regex_R

Regex 在R中搜索句子中的单词

regex r

Regex 在R中搜索句子中的单词,regex,r,Regex,R,我想请你对以下内容提出建议。我有一个数据框： reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product", "Inexpensive. An improvement over integrated graphics.",

我想请你对以下内容提出建议。我有一个数据框：

reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product",
                            "Inexpensive. An improvement over integrated graphics.",
                            "I love that product so excite. I will order again if I need more .",
                            "Excellent card, great graphics."),
                            user = c(1,2,3,4),
                            Review_Id = c("101968","101968","210546","112546"))

任何建议或方法都将不胜感激。非常感谢转发。

您可以试试

merge.data.frame(x = topics, y = reviews, by = c("Review_Id"), all.x = TRUE, all.y = FALSE)

你试过合并吗？i、 e.

合并（主题、评论）

在

主题

中，相同的评论Id链接到两个不同的用户是否正常？否则，您可以尝试

merge.data.frame（x=topics，y=reviews，by=c（“Review\u Id”），all.x=TRUE，all.y=FALSE）

，或者

merge.data.frame（x=topics，y=reviews，by=c（“Review\u Id”，“user”），all.x=TRUE，all.y=FALSE）

一旦修复了双用户问题，我就这么做了。当我使用merge时，我将所有句子都放在一行中，但我只需要包含特定主题的句子。是的，相同的评论Id可以链接到两个不同的用户。问题是我只需要一个包含特定主题的句子。有什么想法吗？所需的输出就是我需要的。假设您在reviews数据框中添加

stringsAsFactors=FALSE

，下面的代码返回一个逻辑向量，给出包含第一个主题的第一篇评论的所有句子：

grepl（topics$topic[1]，strsplit（reviews$value[1]，'.'，fixed=TRUE）[[1]]）

。剩下的应该是直截了当的。谢谢你，这是我在使用智能正则表达式提取包含主题的特定句子之前需要用到的。评论：请注意你文章的质量。

            topic      user    Review_Id                                   review
            product       1    101968     Product was received in excellent condition.
            condition     1    101968     Product was received in excellent condition.
            materials     1    101968                Made with high quality materials.
            product       1    101968                               Very Good product
  integrated graphics     2    101968         An improvement over integrated graphics.
            product       3    210546                   I love that product so excite.
               card       4    112546                  Excellent card, great graphics.
            graphics      4    112546                  Excellent card, great graphics.

merge.data.frame(x = topics, y = reviews, by = c("Review_Id"), all.x = TRUE, all.y = FALSE)