Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中按组合并数据_R_Merge_Aggregate - Fatal编程技术网

在R中按组合并数据

在R中按组合并数据,r,merge,aggregate,R,Merge,Aggregate,我构造以下data.frame对象: name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie") incidents <- c(133, 36, 1242, 2, NA) gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE") data <- data.frame(name, incidents, gender) 首先,我使用 clean_data <- d

我构造以下data.frame对象:

name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
incidents <- c(133, 36, 1242, 2, NA)
gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE")
data <- data.frame(name, incidents, gender)
首先,我使用

clean_data <- data[!is.na(incidents), ]
现在,我按性别与

agg <- aggregate(incidents ~ gender, clean_data, mean)
现在,我想用agg的数据填充事件中的NA值,这样数据=

    name incidents gender
1  Homer       133   MALE
2  Marge        36 FEMALE
3   Bart      1242   MALE
4   Lisa         2 FEMALE
5 Maggie      19.0 FEMALE
使用base R最简单的方法是什么?

您可以使用ave。它以与原始数据集中相同的顺序提供平均值VAL,检查事件列中的NA元素,并用相应NA元素的VAL替换这些元素

 vals <- with(data, ave(incidents, gender, FUN= function(x)
                                         mean(x, na.rm=TRUE)))
 indx1 <- is.na(data$incidents)
 data$incidents[indx1] <- vals[indx1]
如@MrFlick在评论中所示的较短版本。使用ifelse,它将NA元素替换为平均值

 data$incidents<-with(data, ave(incidents, gender,
          FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))

代替ifelse,replace也可以用data.table显示为@Ananda Mahto

对于多样性,这里有一种使用data.table的方法,它还演示了replace函数

library(data.table)
as.data.table(data)[
  , incidents := replace(incidents, is.na(incidents), 
                         mean(incidents, na.rm = TRUE)), 
  by = gender][]
#      name incidents gender
# 1:  Homer       133   MALE
# 2:  Marge        36 FEMALE
# 3:   Bart      1242   MALE
# 4:   Lisa         2 FEMALE
# 5: Maggie        19 FEMALE

我也有同样的想法,但没有数据$incidents@MrFlick看起来好多了。您可以将其作为新答案发布。事实上,我在做ave之前没有看过结果。老实说,我认为在完整的data.frame上使用ave是这里的秘密。我的回答显得多余。那么完整的解决方案应该是什么呢?我是否执行了太多步骤?@GeoffLittle MrFlick的版本是最短的。
 vals <- with(data, ave(incidents, gender, FUN= function(x)
                                         mean(x, na.rm=TRUE)))
 indx1 <- is.na(data$incidents)
 data$incidents[indx1] <- vals[indx1]
 data$incidents<-with(data, ave(incidents, gender,
          FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))
library(data.table)
as.data.table(data)[
  , incidents := replace(incidents, is.na(incidents), 
                         mean(incidents, na.rm = TRUE)), 
  by = gender][]
#      name incidents gender
# 1:  Homer       133   MALE
# 2:  Marge        36 FEMALE
# 3:   Bart      1242   MALE
# 4:   Lisa         2 FEMALE
# 5: Maggie        19 FEMALE