R 如何基于初始列a、B、C值设置列C值
我有下表:R 如何基于初始列a、B、C值设置列C值,r,data.table,R,Data.table,我有下表: A B C food fruit apple food fruit food drink food fruit car suv ford car sedan bmw car suv car sedan 预期结果: A B C food fruit apple food fruit apple food drink food fruit apple car suv
A B C
food fruit apple
food fruit
food drink
food fruit
car suv ford
car sedan bmw
car suv
car sedan
预期结果:
A B C
food fruit apple
food fruit apple
food drink
food fruit apple
car suv ford
car sedan bmw
car suv ford
car sedan bmw
如何根据A列和B列中的值填写C列?例如,如果A列中的值=食物,B列中的值=水果,则C列中应填入。理想情况下,我希望这样做而不必手动输入A、B列对和相应的C列值,因为我的表有数千个这样的组合
非常感谢您的帮助 这里是一个使用data.table的解决方案
library(data.table)
setDT(dx)[,id:=1:.N] ## create variable to conserve origin order
dx[,C:={
val <- unique(C[nzchar(C)])
if(length(val)==0) val <- "" ## case empty C
if(length(val)>1) val <- val[1] ## case multiple values
rep(val,length(C))
}, "A,B"][order(id)][,id:=NULL]
# A B C
# 1: food fruit apple
# 2: food fruit apple
# 3: food drink
# 4: food fruit apple
# 5: car suv ford
# 6: car sedan bmw
# 7: car suv ford
# 8: car sedan bmw
库(data.table)
setDT(dx)[,id:=1.N]##创建变量以保留原始顺序
dx[,C:={
val使用数据的两个备选方案。表
:
library(data.table)
setDT(d1)[, C := C[C != ''], by = .(A,B)][]
setDT(d1)[, C := ifelse(all(C == ''), '', C[C != '']), by = .(A,B)][]
两者都给出:
使用dplyr
的替代方法:
library(dplyr)
d1 %>%
group_by(A, B) %>%
summarise(C = ifelse(all(C == ''), '', C[C != ''])) %>%
right_join(., d1, by = c('A','B')) %>%
select(A, B, C = C.x)
library(dplyr)
library(tidyr)
df %>%
mutate(C = ifelse(C == "", NA, C)) %>%
group_by(A, B) %>%
fill(C)
这给出了类似的结果。从tidyr
中填充的溶液:
library(dplyr)
d1 %>%
group_by(A, B) %>%
summarise(C = ifelse(all(C == ''), '', C[C != ''])) %>%
right_join(., d1, by = c('A','B')) %>%
select(A, B, C = C.x)
library(dplyr)
library(tidyr)
df %>%
mutate(C = ifelse(C == "", NA, C)) %>%
group_by(A, B) %>%
fill(C)
结果:
# A tibble: 8 x 3
# Groups: A, B [4]
A B C
<chr> <chr> <chr>
1 car sedan bmw
2 car sedan bmw
3 car suv ford
4 car suv ford
5 food drink <NA>
6 food fruit apple
7 food fruit apple
8 food fruit apple
# A tibble: 8 x 3
# Groups: A, B [4]
A B C
<chr> <chr> <chr>
1 food fruit apple
2 food fruit apple
3 food drink <NA>
4 food fruit apple
5 car suv ford
6 car sedan bmw
7 car suv ford
8 car sedan bmw
df = structure(list(A = c("food", "food", "food", "food", "car", "car",
"car", "car"), B = c("fruit", "fruit", "drink", "fruit", "suv",
"sedan", "suv", "sedan"), C = c("apple", "", "", "", "ford",
"bmw", "", "")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA,
-8L))
结果:
# A tibble: 8 x 3
# Groups: A, B [4]
A B C
<chr> <chr> <chr>
1 car sedan bmw
2 car sedan bmw
3 car suv ford
4 car suv ford
5 food drink <NA>
6 food fruit apple
7 food fruit apple
8 food fruit apple
# A tibble: 8 x 3
# Groups: A, B [4]
A B C
<chr> <chr> <chr>
1 food fruit apple
2 food fruit apple
3 food drink <NA>
4 food fruit apple
5 car suv ford
6 car sedan bmw
7 car suv ford
8 car sedan bmw
df = structure(list(A = c("food", "food", "food", "food", "car", "car",
"car", "car"), B = c("fruit", "fruit", "drink", "fruit", "suv",
"sedan", "suv", "sedan"), C = c("apple", "", "", "", "ford",
"bmw", "", "")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA,
-8L))
我已经成功地使用了你的建议:setDT(d1)[,C:=C[C!=”],by=(A,B)][]我想知道如果我有一个额外的列需要填充,我如何应用这个,列D。例如,一行有以下值:A=食物B=水果C=苹果D=红色。如果另一行有A=食物,B=水果,C=(空白),D=(空白),我想用C=apple,D=red来填充那一行中的列。非常感谢您的帮助。谢谢!有什么建议吗?@user6340762抱歉,由于除夕庆祝活动忘了回到这一行;一个可能的解决方案:setDT(d1)[,`:=`(C=C[C!='',D=D[D!=''),by=(a,B)]
一点问题都没有,当然可以理解!这似乎很有效。谢谢!有没有关于如何/如果我可以忽略某些例外情况的建议?例如,大多数行的值是a=食物B=水果C=苹果D=红色。但有几行的值是a=食物B=水果C=香蕉D=黄色。在填充其他内容时,如何可以忽略这些例外情况空白行,如果A=食物,B=水果,所有空白的C,D列总是用C=苹果和D=红色(不是C=香蕉,D=黄色)填充。谢谢!