Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/redis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R中的割函数选择_R_Regex_Dplyr - Fatal编程技术网

R中的割函数选择

R中的割函数选择,r,regex,dplyr,R,Regex,Dplyr,我在表格中有一些数据: Person.ID Household.ID Composition 1 4593 1A_0C 2 4992 2A_1C 3 9843 1A_1C 4 8385 2A_2C 5 9823 8A_1C

我在表格中有一些数据:

Person.ID    Household.ID    Composition 
   1             4593           1A_0C
   2             4992           2A_1C
   3             9843           1A_1C 
   4             8385           2A_2C  
   5             9823           8A_1C 
   6             3458           1C_9C 
   7             7485           2C_0C 
   :               :              :    
我们可以将成分变量视为成人/儿童的计数,即2A_1C等于两名成人和两名儿童

我想做的是减少可能的作文水平。对于第五个人,我们有8A_1C的成分,我正在寻找一种方法将其减少到4+a_0C。例如,对于任何大于4A的成分值,我们都有4+

Person.ID     Household.ID     Composition 
    5             9823            4+A_1C
    6             3458             1A_4+C
    :               :                :
我不确定如何在R中执行此操作,我正在考虑使用dyplyr中的filter()或select()。否则我需要使用某种正则表达式


任何帮助都将不胜感激。谢谢

我们可以使用
gsub

df$Composition <- gsub("(?<!\\d)([5-9]|\\d{2,})(?=[AC])", "4+", df$Composition, perl = TRUE)
数据:


Person.ID如果您首先解析
组合
字符串并将其分成两个数字列,一个用于成人数量,另一个用于儿童数量,似乎会容易得多。然后,您可以用
4+
替换所有大于4的条目,并通过将其粘贴在一起将其转换回字符串
  Person.ID Household.ID Composition
1         1         4593       1A_0C
2         2         4992       2A_1C
3         3         9843       1A_1C
4         4         8385       2A_2C
5         5         9823      4+A_1C
6         6         3458      1C_4+C
7         7         7485       2C_0C
Person.ID <- c(1,2,3,4,5,6,7,8)
Household.ID <- c(4593,4992,9843,8385,9823,3458,7485)
Composition <- c("1A_0C","2A_1C","1A_1C","2A_2C","8A_1C","1A_9C","2A_0C")
dat <- tibble(Person.ID, Household.ID, Composition)
above4 <- function(f){
    ff <- gsub("[^0-9]","",f)
    if(ff>4){return("4+")}
    if(ff<=4){return(ff)}
}
dat_ <- dat %>% tidyr::separate(., col=Composition, 
                           into=c("Adults", "Children"), 
                           sep="_") %>%
        dplyr::mutate(Adults_ = unlist(lapply(Adults,above4)), 
                         Children_ = unlist(lapply(Children,above4)))
dat_ %>% dplyr::mutate(Composition_ = paste0(Adults_, "A_", Children_, "C")) %>%
         dplyr::select(Person.ID, Household.ID, Composition=Composition_)

 # A tibble: 7 x 3
      Person.ID Household.ID Composition
          <dbl>        <dbl> <chr>
    1        1.        4593. 1A_0C
    2        2.        4992. 2A_1C
    3        3.        9843. 1A_1C
    4        4.        8385. 2A_2C
    5        5.        9823. 4+A_1C
    6        6.        3458. 1A_4+C
    7        7.        7485. 2A_0C