R 如何基于行类别匹配创建新列?
我正在尝试根据行中的匹配值向数据框中添加新列。我的起始数据示例如下:R 如何基于行类别匹配创建新列?,r,if-statement,data.table,tidyverse,R,If Statement,Data.table,Tidyverse,我正在尝试根据行中的匹配值向数据框中添加新列。我的起始数据示例如下: ex <- structure(list(reg_desc = c("1-Northeast Region", "1-Northeast Region", "1-Northeast Region", "1-Northeast Region", "1-Northeast Region" ), state = c(&
ex <- structure(list(reg_desc = c("1-Northeast Region", "1-Northeast Region",
"1-Northeast Region", "1-Northeast Region", "1-Northeast Region"
), state = c("04-Connecticut", "05-Maine", "04-Connecticut",
"05-Maine", NA), trigger_city = c("14860-Bridgeport-Stamford-Norwalk",
"12620-Bangor", NA, NA, NA), Category = c("M", "M", "S", "S",
"R"), Cred_Fac = c(0, 0, 0.317804971641414, 0, 1), Mean = c(50323.3311111111,
48944.4266666667, 44220.8220792079, 43724.1495, 50492.0654351396
)), row.names = c(1L, 7L, 118L, 119L, 136L), class = "data.frame")
ex如果我正确理解了您的问题,一个可能的解决方案是使用join操作(在本例中来自dplyr包):
库(dplyr)
ex%>%
dplyr::左联合(ex%>%
dplyr::过滤器(is.na(触发器城市)和!is.na(州))%>%
dplyr::选择(状态、信用、平均值),
by=“state”
) %>%
dplyr::左联合(ex%>%
dplyr::过滤器(is.na(触发器城市)&is.na(州))%>%
dplyr::选择(注册描述、信用因子、平均值),
by=“reg\u desc”
) %>%
dplyr::mutate(Cred_Fac.y=ifelse(is.na(trigger_city),na,Cred_Fac.y),
Mean.y=ifelse(is.na(触发城市),na,Mean.y),
Cred_Fac=ifelse(is.na(州),na,Cred_Fac),
平均值=ifelse(is.na(州),na,平均值))%>%
dplyr::选择(注册描述=1,州=2,触发城市=3,类别=4,信用风险系数=5,平均值=6,州信用风险系数=7,平均风险系数=8,注册风险系数=9,平均风险系数=10)
注册描述州触发城市类别信用系数平均州信用系数平均州信用系数平均地区
1 1-东北地区04康涅狄格州14860布里奇波特斯坦福德诺沃克M 0.000000 50323.33 0.317805 44220.82 1 50492.07
2 1-缅因州东北部地区05 12620班戈M 0.000000 48944.43 0.000000 43724.15 1 50492.07
3 1-东北地区04康涅狄格州南部0.317805 44220.82 NA 1 50492.07
4 1-东北地区05缅因州南部0.000000 43724.15 NA 1 50492.07
5 1-东北地区R 1.000000 50492.07纳
如何在4个新列中选择这些值?
hi1 <- data.frame(reg_desc = c("1-Northeast Region", "1-Northeast Region",
"1-Northeast Region", "1-Northeast Region", "1-Northeast Region"
), state = c("04-Connecticut", "05-Maine", "04-Connecticut",
"05-Maine", NA), trigger_city = c("14860-Bridgeport-Stamford-Norwalk",
"12620-Bangor", NA, NA, NA), Category = c("M", "M", "S", "S",
"R"), Cred_Fac = c(0, 0, 0.317804971641414, 0, 1), Mean = c(50323.3311111111,
48944.4266666667, 44220.8220792079, 43724.1495, 50492.0654351396),
State_Cred_Fac = c(0.317805,0.000000,NA,NA,NA),Mean_State = c(44220.82,43724.15,NA,NA,NA),
Reg_Cred_Fac = c(1.000000,1.000000,1.000000,1.000000,NA),
Mean_Region = c(50492.07,50492.07,50492.07,50492.07,NA))
library(dplyr)
ex %>%
dplyr::left_join(ex %>%
dplyr::filter(is.na(trigger_city) & !is.na(state)) %>%
dplyr::select(state, Cred_Fac, Mean),
by = "state"
) %>%
dplyr::left_join(ex %>%
dplyr::filter(is.na(trigger_city) & is.na(state)) %>%
dplyr::select(reg_desc, Cred_Fac, Mean),
by = "reg_desc"
) %>%
dplyr::mutate(Cred_Fac.y = ifelse(is.na(trigger_city), NA, Cred_Fac.y),
Mean.y = ifelse(is.na(trigger_city), NA, Mean.y),
Cred_Fac = ifelse(is.na(state), NA, Cred_Fac),
Mean = ifelse(is.na(state), NA, Mean)) %>%
dplyr::select(reg_desc = 1, state = 2, trigger_city = 3, Category = 4, Cred_Fac = 5, Mean = 6, State_Cred_Fac = 7, Mean_State = 8, Reg_Cred_Fac= 9, Mean_Region = 10)
reg_desc state trigger_city Category Cred_Fac Mean State_Cred_Fac Mean_State Reg_Cred_Fac Mean_Region
1 1-Northeast Region 04-Connecticut 14860-Bridgeport-Stamford-Norwalk M 0.000000 50323.33 0.317805 44220.82 1 50492.07
2 1-Northeast Region 05-Maine 12620-Bangor M 0.000000 48944.43 0.000000 43724.15 1 50492.07
3 1-Northeast Region 04-Connecticut <NA> S 0.317805 44220.82 NA NA 1 50492.07
4 1-Northeast Region 05-Maine <NA> S 0.000000 43724.15 NA NA 1 50492.07
5 1-Northeast Region <NA> <NA> R 1.000000 50492.07 NA NA NA NA