用R中的不同模式替换同一字符串的多个匹配项_R_Regex_Dplyr

用R中的不同模式替换同一字符串的多个匹配项

r regex

用R中的不同模式替换同一字符串的多个匹配项,r,regex,dplyr,R,Regex,Dplyr,我有一组数据帧，在每个数据帧中，同一字符串在一列中多次出现，但它们确实反映了不同的观察结果 library(dplyr) govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local", "General government", "Government enterprises") df <- data.frame("gov_

我有一组数据帧，在每个数据帧中，同一字符串在一列中多次出现，但它们确实反映了不同的观察结果

library(dplyr)
govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local", 
          "General government", "Government enterprises")
df <- data.frame("gov_levels" = govs, revenue = rnorm(7, mean = 1000, sd = 50))
df

但这取决于“一般政府”是在偶数行还是奇数行，这是不一致的，正如我在变异前删除第一行时所示：

    df %>%
    filter(gov_levels != "Government") %>%
    mutate(gov_levels = stri_replace_first_fixed(str = gov_levels, pattern = "General government", 
                                           replacement = c("Federal general government", 
                                                           "State and local general government")))

这会导致更换顺序错误。我正在寻找一种方法来一致地应用它，以便它不依赖于要替换的字符串的行位置。也就是说，第一场比赛总是由联邦总政府取代，第二场比赛总是由州和地方总政府取代

根据乔治的回答进行更新存在一些不一致的数据帧列表：

govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local", 
          "General government", "Government enterprises", NA, NA)

df1 <- data.frame("col_1" = "col1data", "gov_levels" = govs, revenue = c(rnorm(7, mean = 100, sd = 50), NA, NA), stringsAsFactors = FALSE)
df2 <- data.frame("col_1" = "col1data", "gov_types" = govs, revenue = c(rnorm(7, mean = 100, sd = 50), NA, NA), stringsAsFactors = FALSE)

df2 <- df2 %>%
       filter(gov_types != "Government")

df_list <- list(df1, df2)

这能实现你的目标吗？只需将满足条件的元素替换为所需的字符串向量。但是首先你需要为你的因子添加到允许的水平，否则你会得到一个错误

# First define a string containing the new levels for the 'gov_levels' factor
newlevels <- c("Federal general government", "State and local general government")

# Then add them so that they are allowed as factor levels
levels(df$gov_levels) <- c(levels(df$gov_levels), newlevels)

# Now just replace the values where 'gov_levels' is "General government" with the new string
# They will naturally be assigned in the same order they occur in the dataset
df$gov_levels[df$gov_levels=="General government"] <- newlevels

谢谢这非常有用。是的，在这种情况下，我总是需要匹配这两种情况，并区分联邦和S&L。我正在处理一系列类似的数据帧，因此我正在扩展此解决方案，以便与Lappy一起使用。这引发了几个问题：1）“政府级别”列是chr类型，而不是所有dfs中的系数，2）列中有NAs，3）某些数据帧中的列名不同。这使它变得不那么简单，但我想我有一个解决方案。@CaseyR如果你的向量是一个字符向量开始，那么你根本不需要

级别

位。我将编辑..我也不确定为什么NAs是一个问题，但我收到错误消息“x[，2][x[，2]==“一般政府”]我要补充的是，列名不同，但相关数据总是第二列。@CaseyR good point subscribed assignments不能有

NA

。既然您知道它是第二列，那么只需像您所做的那样使用

[，2]

？还有其他问题吗？


newlevels_gen <- c("Federal general government", "State and local general government")

df_list <- lapply(df_list, 
                  function(x) {x[, 2] <- as.factor(x[, 2]) 
                               return(x)
                               }
                  )

df_list <- lapply(df_list, function(x) {levels(x[,2]) <- c(levels(x[,2]), newlevels_gen)
                                        return(x)
                                        }
                  )

df_list_clean_a <- lapply(df_list, function(x) {x[,2][!is.na(x[,2]) & x[,2] == "General government"] <- newlevels_gen 
                                               return(x)
                                               } 
                         )

# First define a string containing the new levels for the 'gov_levels' factor
newlevels <- c("Federal general government", "State and local general government")

# Then add them so that they are allowed as factor levels
levels(df$gov_levels) <- c(levels(df$gov_levels), newlevels)

# Now just replace the values where 'gov_levels' is "General government" with the new string
# They will naturally be assigned in the same order they occur in the dataset
df$gov_levels[df$gov_levels=="General government"] <- newlevels

df$gov_levels[df$gov_levels=="General government"] <- 
      c("Federal general government", "State and local general")