Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/.htaccess/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用grepl基于另一列创建列 让我们考虑一个 DF,有两列:代码> Word < /COD>和 STOR>代码>。我想创建一个新列,用于检查stem中的值是否包含在word中,以及它的前面或后面是否有更多字符。最终结果应如下所示: WORD STEM NEW rerun run prefixed runner run suffixed run run none ... ... ..._R_String_Dataframe_Grepl_Startswith - Fatal编程技术网

使用grepl基于另一列创建列 让我们考虑一个 DF,有两列:代码> Word < /COD>和 STOR>代码>。我想创建一个新列,用于检查stem中的值是否包含在word中,以及它的前面或后面是否有更多字符。最终结果应如下所示: WORD STEM NEW rerun run prefixed runner run suffixed run run none ... ... ...

使用grepl基于另一列创建列 让我们考虑一个 DF,有两列:代码> Word < /COD>和 STOR>代码>。我想创建一个新列,用于检查stem中的值是否包含在word中,以及它的前面或后面是否有更多字符。最终结果应如下所示: WORD STEM NEW rerun run prefixed runner run suffixed run run none ... ... ...,r,string,dataframe,grepl,startswith,R,String,Dataframe,Grepl,Startswith,下面你可以看到我的代码。但是,它不起作用,因为grepl表达式应用于df的所有行。不管怎样,我认为这应该说明我的想法 df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both', ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed', ifelse(grepl(paste0('.+', df$st

下面你可以看到我的代码。但是,它不起作用,因为
grepl
表达式应用于
df
的所有行。不管怎样,我认为这应该说明我的想法

df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
             ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
                ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))

df$new您可以使用
mapply
每行使用
grepl
,如:

ifelse(mapply(grepl, paste0(".+", x$STEM, ".+"), x$WORD), "both",
ifelse(mapply(grepl, paste0(x$STEM, ".+"), x$WORD), "suffixed",
ifelse(mapply(grepl, paste0(".+", x$STEM), x$WORD), "prefixed", "none")))
#"prefixed" "suffixed"     "none" 
或者使用
startsWith
endsWith
并使用子集形式向量:

c("none", "both", "prefixed", "suffixed")[1 + (1 + startsWith(x$WORD, x$STEM) +
 2*endsWith(x$WORD, x$STEM)) * (nchar(x$WORD) > nchar(x$STEM) &
 mapply(grepl, x$STEM, x$WORD))]
#[1] "suffixed" "prefixed" "none"    

您可以像这样创建
new

df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
                 ifelse(startsWith(df$word, df$stem), 'suffixed',
                        ifelse(endsWith(df$word, df$stem), 'prefixed',
                               'both')))
输出

#       word stem     new1
# 1    rerun  run prefixed
# 2   runner  run suffixed
# 3      run  run     none
# 4    aruna  run     both

下面是一种使用
stru locate
from
stringr
dplyr
的方法:

library(dplyr)
library(stringr)
data %>%
  mutate_at(vars(WORD,STEM), as.character) %>%
  mutate(NEW = 
         case_when(str_locate(WORD,STEM)[,"start"] > 1 &
                   str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
                   str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
                   str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
                   TRUE ~ "none"))
    WORD STEM      NEW
1  rerun  run prefixed
2 runner  run suffixed
3    run  run     none
库(dplyr)
图书馆(stringr)
数据%>%
在(变号(字,干),如.字符)%>%
变异(新=
当(str_locate(WORD,STEM)[,“start”]>1时的大小写&
str_locate(单词,词干)[,“end”]1~“前缀”,
str_locate(单词,词干)[,“end”]

我添加了一行代码,将
单词
词干
转换为字符,以防它们是因素

谢谢你的快速回复。我选择这个答案作为解决方案,因为它与我的方法最为相似。无论如何,伊恩·坎贝尔也解决了这个问题problem@hyhno01为了让你知道,我更新了我的答案:我取消了比较单词和词干的
nchar
,因为我意识到这是多余的。
library(dplyr)
library(stringr)
data %>%
  mutate_at(vars(WORD,STEM), as.character) %>%
  mutate(NEW = 
         case_when(str_locate(WORD,STEM)[,"start"] > 1 &
                   str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
                   str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
                   str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
                   TRUE ~ "none"))
    WORD STEM      NEW
1  rerun  run prefixed
2 runner  run suffixed
3    run  run     none