Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何在指定单词旁边隔离一个单词_R_String_Stringr - Fatal编程技术网

R 如何在指定单词旁边隔离一个单词

R 如何在指定单词旁边隔离一个单词,r,string,stringr,R,String,Stringr,我的数据帧有多种字符串。参见示例df: strings <- c("Average complications and higher payment", "Average complications and average payment", "Average complications and lower payment", "Average mortality and higher payment", "Better mo

我的数据帧有多种字符串。参见示例df:

strings <- c("Average complications and higher payment",
        "Average complications and average payment",
        "Average complications and lower payment",
        "Average mortality and higher payment",
        "Better mortality and average payment")
    df <- data.frame(strings, stringsAsFactors = F)

strings具有
strsplit
head
tail
功能:

outDF = do.call(rbind,lapply(DF$strings,function(x) {

#split string
strObj = unlist(strsplit(x,split=" "))

#outputDF
data.frame(strings = x,QualityWord=head(strObj,1),PaymentWord= head(tail(strObj,2),1),stringsAsFactors=FALSE) 

}))

outDF
#                                    strings QualityWord PaymentWord
#1  Average complications and higher payment     Average      higher
#2 Average complications and average payment     Average     average
#3   Average complications and lower payment     Average       lower
#4      Average mortality and higher payment     Average      higher
#5      Better mortality and average payment      Better     average
或:

使用
dplyr
和自定义功能:

customFn = function(x) { 
strObj = unlist(strsplit(x,split=" ")); 
outputDF = data.frame(strings = x,QualityWord=head(strObj,1),PaymentWord= head(tail(strObj,2),1),stringsAsFactors=FALSE);
}

DF %>% 
dplyr::rowwise() %>% 
dplyr::do(customFn(.$strings))
df$QualityWord=sub(“(\\w+).*?$”,“\\1”,df$strings)
df$PaymentWord=sub(“*?(\\w+)付款$”,“\\1”,df$字符串)
df
#>字符串QualityWord PaymentWord
#>1平均并发症和较高的报酬平均较高
#>2平均并发症和平均付款平均
#>3平均并发症和较低付款平均较低
#>4平均死亡率和较高的报酬平均较高
#>5更好的死亡率和平均报酬更好的平均水平
regex术语解释如下:

  • (\\w+)
    =将单词字符匹配一次或多次,作为一个组捕获
  • *?
    =不贪婪地匹配任何东西
  • payment
    =匹配空格,然后匹配字符
    payment
  • $
    =匹配字符串的结尾
  • \\1
    =用第一组中的图案替换图案

我们可以使用
提取
tidyr

library(tidyverse)
df %>%
   extract(strings, into = c("QaulityWord", "PaymentWord"),
           "^(\\w+).*\\b(\\w+)\\s+\\w+$", remove = FALSE)
#                                   strings QaulityWord PaymentWord
#1  Average complications and higher payment     Average      higher
#2 Average complications and average payment     Average     average
#3   Average complications and lower payment     Average       lower
#4      Average mortality and higher payment     Average      higher
#5      Better mortality and average payment      Better     average
library(tidyverse)
df %>%
   extract(strings, into = c("QaulityWord", "PaymentWord"),
           "^(\\w+).*\\b(\\w+)\\s+\\w+$", remove = FALSE)
#                                   strings QaulityWord PaymentWord
#1  Average complications and higher payment     Average      higher
#2 Average complications and average payment     Average     average
#3   Average complications and lower payment     Average       lower
#4      Average mortality and higher payment     Average      higher
#5      Better mortality and average payment      Better     average