Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用r在两列中匹配单词_R_String_Data Manipulation - Fatal编程技术网

使用r在两列中匹配单词

使用r在两列中匹配单词,r,string,data-manipulation,R,String,Data Manipulation,我有两个数据框,其中DF1是单词词典,DF2是句子。我想进行文本匹配,如果DF1中的单词与DF2句子匹配,则句子中的任何单词都应在列中显示,如果匹配,则显示是;如果不匹配,则显示否。数据框如下: DF1单词词典: DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase") DF2协议: DF2 <- c("Customer satisfaction index improvement

我有两个数据框,其中DF1是单词词典,DF2是句子。我想进行文本匹配,如果DF1中的单词与DF2句子匹配,则句子中的任何单词都应在列中显示,如果匹配,则显示是;如果不匹配,则显示否。数据框如下:

DF1单词词典:

DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
DF2协议:

DF2 <- c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor")
输出应为:

客户满意度指数改善是

减少零售周期

提高市场份额是的

%从供应商处恢复编号

注意-是和否是显示文本匹配结果的不同列 有人能帮忙吗?…提前谢谢你试试这个:

DF1 <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")
DF2 <- c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor")


result <- cbind(DF2, "word found" = ifelse(rowSums(sapply(DF1, grepl, x = DF2)) > 0, "YES", "NO"))

> result
     DF2                                       word found
[1,] "Customer satisfaction index improvement" "YES"     
[2,] "reduction in retail cycle"               "NO"      
[3,] "Improve market share"                    "YES"     
[4,] "% recovery from vendor"                  "NO"    

你可以这样做:

df <- data.frame(sentence = c("Customer satisfaction index improvement", "reduction in retail cycle", "Improve market share", "% recovery from vendor"))
words <- c("csi", "dsi", "market", "share", "improvement", "dealers", "increase")

# combine the words in a regular expression and bind it as column yes
df <- cbind(df, yes = grepl(paste(words, collapse = "|"), df$sentence))

查看。

请重新设计您的问题,以包含两个数据集,其格式可以复制粘贴,以及最终结果,否则很难回答您的问题。DF1是第一个数据帧,DF2如果第二个数据帧和输出应该类似,如果DF2的第一行是客户满意度指数改善,则显示是是的,我理解,但它的格式不是一个人可以轻松复制并粘贴到他的R会话中来寻找答案的格式。您可以尝试放置dputDF1或类似的东西,以使其更容易。有关更多详细信息,请参见此处:DF1请参见答案,并告诉我当我将其应用于完整的数据集时,它在输出中仅显示“是”,这是什么意思?我猜您的完整数据集只包含DF1中的更多单词或DF2中的更多句子,在任何情况下都不应该有任何更改。DF1包含更多单词作为其单词词典,DF2是句子中的描述,我只是给出了示例,因为我无法将完整数据粘贴到此处。是的,我理解您的意思。我能想到的唯一原因是来自单词或句子的数据不是矢量格式。难道你只有一根大绳子吗?请粘贴strDF2和STRDF1的结果。你能分享你的电子邮件id吗?我会用excel向你发送样本数据框file@Roshan:那么请提供更多输入。
                                 sentence   yes
1 Customer satisfaction index improvement  TRUE
2               reduction in retail cycle FALSE
3                    Improve market share  TRUE
4                  % recovery from vendor FALSE