用R中的不同模式替换同一字符串的多个匹配项

用R中的不同模式替换同一字符串的多个匹配项,r,regex,dplyr,R,Regex,Dplyr,我有一组数据帧,在每个数据帧中,同一字符串在一列中多次出现,但它们确实反映了不同的观察结果 library(dplyr) govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local", "General government", "Government enterprises") df <- data.frame("gov_

我有一组数据帧,在每个数据帧中,同一字符串在一列中多次出现,但它们确实反映了不同的观察结果

library(dplyr)
govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local", 
          "General government", "Government enterprises")
df <- data.frame("gov_levels" = govs, revenue = rnorm(7, mean = 1000, sd = 50))
df
但这取决于“一般政府”是在偶数行还是奇数行,这是不一致的,正如我在变异前删除第一行时所示:

    df %>%
    filter(gov_levels != "Government") %>%
    mutate(gov_levels = stri_replace_first_fixed(str = gov_levels, pattern = "General government", 
                                           replacement = c("Federal general government", 
                                                           "State and local general government"))) 
这会导致更换顺序错误。我正在寻找一种方法来一致地应用它,以便它不依赖于要替换的字符串的行位置。也就是说,第一场比赛总是由联邦总政府取代,第二场比赛总是由州和地方总政府取代

根据乔治的回答进行更新 存在一些不一致的数据帧列表:

govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local", 
          "General government", "Government enterprises", NA, NA)

df1 <- data.frame("col_1" = "col1data", "gov_levels" = govs, revenue = c(rnorm(7, mean = 100, sd = 50), NA, NA), stringsAsFactors = FALSE)
df2 <- data.frame("col_1" = "col1data", "gov_types" = govs, revenue = c(rnorm(7, mean = 100, sd = 50), NA, NA), stringsAsFactors = FALSE)

df2 <- df2 %>%
       filter(gov_types != "Government")

df_list <- list(df1, df2)

这能实现你的目标吗?只需将满足条件的元素替换为所需的字符串向量。但是首先你需要为你的因子添加到允许的水平,否则你会得到一个错误

# First define a string containing the new levels for the 'gov_levels' factor
newlevels <- c("Federal general government", "State and local general government")

# Then add them so that they are allowed as factor levels
levels(df$gov_levels) <- c(levels(df$gov_levels), newlevels)

# Now just replace the values where 'gov_levels' is "General government" with the new string
# They will naturally be assigned in the same order they occur in the dataset
df$gov_levels[df$gov_levels=="General government"] <- newlevels


谢谢这非常有用。是的,在这种情况下,我总是需要匹配这两种情况,并区分联邦和S&L。我正在处理一系列类似的数据帧,因此我正在扩展此解决方案,以便与Lappy一起使用。这引发了几个问题:1)“政府级别”列是chr类型,而不是所有dfs中的系数,2)列中有NAs,3)某些数据帧中的列名不同。这使它变得不那么简单,但我想我有一个解决方案。@CaseyR如果你的向量是一个字符向量开始,那么你根本不需要
级别
位。我将编辑..我也不确定为什么NAs是一个问题,但我收到错误消息“x[,2][x[,2]==“一般政府”]我要补充的是,列名不同,但相关数据总是第二列。@CaseyR good point subscribed assignments不能有
NA
。既然您知道它是第二列,那么只需像您所做的那样使用
[,2]
?还有其他问题吗?

newlevels_gen <- c("Federal general government", "State and local general government")

df_list <- lapply(df_list, 
                  function(x) {x[, 2] <- as.factor(x[, 2]) 
                               return(x)
                               }
                  )

df_list <- lapply(df_list, function(x) {levels(x[,2]) <- c(levels(x[,2]), newlevels_gen)
                                        return(x)
                                        }
                  )

df_list_clean_a <- lapply(df_list, function(x) {x[,2][!is.na(x[,2]) & x[,2] == "General government"] <- newlevels_gen 
                                               return(x)
                                               } 
                         )

# First define a string containing the new levels for the 'gov_levels' factor
newlevels <- c("Federal general government", "State and local general government")

# Then add them so that they are allowed as factor levels
levels(df$gov_levels) <- c(levels(df$gov_levels), newlevels)

# Now just replace the values where 'gov_levels' is "General government" with the new string
# They will naturally be assigned in the same order they occur in the dataset
df$gov_levels[df$gov_levels=="General government"] <- newlevels

df$gov_levels[df$gov_levels=="General government"] <- 
      c("Federal general government", "State and local general")