Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何基于初始列a、B、C值设置列C值_R_Data.table - Fatal编程技术网

R 如何基于初始列a、B、C值设置列C值

R 如何基于初始列a、B、C值设置列C值,r,data.table,R,Data.table,我有下表: A B C food fruit apple food fruit food drink food fruit car suv ford car sedan bmw car suv car sedan 预期结果: A B C food fruit apple food fruit apple food drink food fruit apple car suv

我有下表:

A     B       C
food  fruit   apple
food  fruit   
food  drink
food  fruit   
car   suv     ford
car   sedan   bmw
car   suv
car   sedan
预期结果:

 A     B       C
food  fruit   apple
food  fruit   apple
food  drink
food  fruit   apple 
car   suv     ford
car   sedan   bmw
car   suv     ford
car   sedan   bmw
如何根据A列和B列中的值填写C列?例如,如果A列中的值=食物,B列中的值=水果,则C列中应填入。理想情况下,我希望这样做而不必手动输入A、B列对和相应的C列值,因为我的表有数千个这样的组合


非常感谢您的帮助

这里是一个使用data.table的解决方案

library(data.table)
setDT(dx)[,id:=1:.N] ## create variable to conserve origin order

dx[,C:={
  val <- unique(C[nzchar(C)])  
  if(length(val)==0) val <- ""    ## case empty C
  if(length(val)>1) val <- val[1] ## case multiple values

  rep(val,length(C))
  }, "A,B"][order(id)][,id:=NULL]

#       A     B     C
# 1: food fruit apple
# 2: food fruit apple
# 3: food drink      
# 4: food fruit apple
# 5:  car   suv  ford
# 6:  car sedan   bmw
# 7:  car   suv  ford
# 8:  car sedan   bmw
库(data.table)
setDT(dx)[,id:=1.N]##创建变量以保留原始顺序
dx[,C:={

val使用
数据的两个备选方案。表

library(data.table)
setDT(d1)[, C := C[C != ''], by = .(A,B)][]
setDT(d1)[, C := ifelse(all(C == ''), '', C[C != '']), by = .(A,B)][]
两者都给出:


使用
dplyr
的替代方法:

library(dplyr)
d1 %>% 
  group_by(A, B) %>% 
  summarise(C = ifelse(all(C == ''), '', C[C != ''])) %>% 
  right_join(., d1, by = c('A','B')) %>% 
  select(A, B, C = C.x)
library(dplyr)
library(tidyr)

df %>%
  mutate(C = ifelse(C == "", NA, C)) %>%
  group_by(A, B) %>%
  fill(C) 

这给出了类似的结果。

tidyr
中填充
的溶液:

library(dplyr)
d1 %>% 
  group_by(A, B) %>% 
  summarise(C = ifelse(all(C == ''), '', C[C != ''])) %>% 
  right_join(., d1, by = c('A','B')) %>% 
  select(A, B, C = C.x)
library(dplyr)
library(tidyr)

df %>%
  mutate(C = ifelse(C == "", NA, C)) %>%
  group_by(A, B) %>%
  fill(C) 
结果:

# A tibble: 8 x 3
# Groups:   A, B [4]
      A     B     C
  <chr> <chr> <chr>
1   car sedan   bmw
2   car sedan   bmw
3   car   suv  ford
4   car   suv  ford
5  food drink  <NA>
6  food fruit apple
7  food fruit apple
8  food fruit apple
# A tibble: 8 x 3
# Groups:   A, B [4]
      A     B     C
  <chr> <chr> <chr>
1  food fruit apple
2  food fruit apple
3  food drink  <NA>
4  food fruit apple
5   car   suv  ford
6   car sedan   bmw
7   car   suv  ford
8   car sedan   bmw
df = structure(list(A = c("food", "food", "food", "food", "car", "car", 
"car", "car"), B = c("fruit", "fruit", "drink", "fruit", "suv", 
"sedan", "suv", "sedan"), C = c("apple", "", "", "", "ford", 
"bmw", "", "")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-8L))
结果:

# A tibble: 8 x 3
# Groups:   A, B [4]
      A     B     C
  <chr> <chr> <chr>
1   car sedan   bmw
2   car sedan   bmw
3   car   suv  ford
4   car   suv  ford
5  food drink  <NA>
6  food fruit apple
7  food fruit apple
8  food fruit apple
# A tibble: 8 x 3
# Groups:   A, B [4]
      A     B     C
  <chr> <chr> <chr>
1  food fruit apple
2  food fruit apple
3  food drink  <NA>
4  food fruit apple
5   car   suv  ford
6   car sedan   bmw
7   car   suv  ford
8   car sedan   bmw
df = structure(list(A = c("food", "food", "food", "food", "car", "car", 
"car", "car"), B = c("fruit", "fruit", "drink", "fruit", "suv", 
"sedan", "suv", "sedan"), C = c("apple", "", "", "", "ford", 
"bmw", "", "")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-8L))

我已经成功地使用了你的建议:setDT(d1)[,C:=C[C!=”],by=(A,B)][]我想知道如果我有一个额外的列需要填充,我如何应用这个,列D。例如,一行有以下值:A=食物B=水果C=苹果D=红色。如果另一行有A=食物,B=水果,C=(空白),D=(空白),我想用C=apple,D=red来填充那一行中的列。非常感谢您的帮助。谢谢!有什么建议吗?@user6340762抱歉,由于除夕庆祝活动忘了回到这一行;一个可能的解决方案:
setDT(d1)[,`:=`(C=C[C!='',D=D[D!=''),by=(a,B)]
一点问题都没有,当然可以理解!这似乎很有效。谢谢!有没有关于如何/如果我可以忽略某些例外情况的建议?例如,大多数行的值是a=食物B=水果C=苹果D=红色。但有几行的值是a=食物B=水果C=香蕉D=黄色。在填充其他内容时,如何可以忽略这些例外情况空白行,如果A=食物,B=水果,所有空白的C,D列总是用C=苹果和D=红色(不是C=香蕉,D=黄色)填充。谢谢!