R 在数据帧中合并相似行
我有一个数据框:R 在数据帧中合并相似行,r,R,我有一个数据框: Title Date year lai biomass grain_wt wet_yield 1 HartogSowN 2014-07-31 2014 4.4 NA NA NA 2 HartogMild 2014-07-31 2014 3.7 NA NA NA 3 HartogSevere 2014-07-31 2014 2.3 NA NA
Title Date year lai biomass grain_wt wet_yield
1 HartogSowN 2014-07-31 2014 4.4 NA NA NA
2 HartogMild 2014-07-31 2014 3.7 NA NA NA
3 HartogSevere 2014-07-31 2014 2.3 NA NA NA
4 HartogSowN 2014-08-12 2014 6.1 NA NA NA
5 HartogMild 2014-08-12 2014 6.6 NA NA NA
6 HartogSevere 2014-08-12 2014 3.8 NA NA NA
7 HartogSowN 2014-11-10 2014 NA 16116 NA NA
8 HartogMild 2014-11-10 2014 NA 18224 NA NA
9 HartogSevere 2014-11-10 2014 NA 18184 NA NA
10 HartogSowN 2014-11-10 2014 NA NA 0.041 NA
11 HartogMild 2014-11-10 2014 NA NA 0.040 NA
12 HartogSevere 2014-11-10 2014 NA NA 0.038 NA
13 HartogSowN 2014-08-12 2014 NA 4511 NA NA
14 HartogMild 2014-08-12 2014 NA 4525 NA NA
15 HartogSevere 2014-08-12 2014 NA 3167 NA NA
16 HartogSowN 2014-07-31 2014 NA 2837 NA NA
17 HartogMild 2014-07-31 2014 NA 2444 NA NA
18 HartogSevere 2014-07-31 2014 NA 1940 NA NA
19 HartogSowN 2014-11-10 2014 NA NA NA 8457.4
20 HartogMild 2014-11-10 2014 NA NA NA 8662.4
21 HartogSevere 2014-11-10 2014 NA NA NA 8537.8
22 HartogSowN 2014-11-10 2014 NA NA NA NA
23 HartogMild 2014-11-10 2014 NA NA NA NA
24 HartogSevere 2014-11-10 2014 NA NA NA NA
structure(list(Title = c("HartogSowN", "HartogMild", "HartogSevere",
"HartogSowN", "HartogMild", "HartogSevere", "HartogSowN",
"HartogMild", "HartogSevere", "HartogSowN", "HartogMild",
"HartogSevere", "HartogSowN", "HartogMild", "HartogSevere",
"HartogSowN", "HartogMild", "HartogSevere", "HartogSowN",
"HartogMild", "HartogSevere", "HartogSowN", "HartogMild",
"HartogSevere"), Date = structure(c(16282, 16282, 16282, 16294,
16294, 16294, 16384, 16384, 16384, 16384, 16384, 16384, 16294, 16294,
16294, 16282, 16282, 16282, 16384, 16384, 16384, 16384, 16384,
16384), class = "Date"), year = c(2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014), lai = c(4.4,
3.7, 2.3, 6.1, 6.6, 3.8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), biomass = c(NA, NA, NA, NA, NA, NA,
16116, 18224, 18184, NA, NA, NA, 4511, 4525, 3167, 2837, 2444, 1940,
NA, NA, NA, NA, NA, NA), grain_wt = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.041, 0.04, 0.038, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), wet_yield = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 8457.4,
8662.4, 8537.8, NA, NA, NA)), .Names = c("Title", "Date", "year", "lai", "biomass", "grain_wt", "wet_yield"), row.names = c(NA, 24L),
class = "data.frame")
我希望折叠这些行,以便将给定标题和日期组合的所有数据放在一行上,并删除多余的行。我已经找到了类似问题的答案,但它们都涉及修改原始数据
期望输出:
Title Date year lai biomass grain_wt wet_yield
1 HartogSowN 2014-07-31 2014 4.4 2837 NA NA
2 HartogMild 2014-07-31 2014 3.7 2444 NA NA
3 HartogSevere 2014-07-31 2014 2.3 1940 NA NA
4 HartogSowN 2014-08-12 2014 6.1 4511 NA NA
5 HartogMild 2014-08-12 2014 6.6 4525 NA NA
6 HartogSevere 2014-08-12 2014 3.8 3167 NA NA
7 HartogSowN 2014-11-10 2014 NA 16116 0.041 8457.4
8 HartogMild 2014-11-10 2014 NA 18224 0.040 8662.4
9 HartogSevere 2014-11-10 2014 NA 18184 0.038 8537.8
22 HartogSowN 2014-11-10 2014 NA NA NA NA
23 HartogMild 2014-11-10 2014 NA NA NA NA
24 HartogSevere 2014-11-10 2014 NA NA NA NA
随着额外的行保持生物量,粮食重量和湿产量被删除
更新:谢谢帕斯卡,是的,时间应该匹配,我错了。我已经更新了期望的结果
更新2:为清晰起见,添加了完整的所需输出。考虑使用aggregate()的以下基本R解决方案。下面使用中位数作为函数,但任何聚合都应起作用(平均值、最小值、最大值等),但NAs的处理方式将有所不同
# AGGREGATED DF
collapsedf <- aggregate(list(lai=df$lai,
biomass=df$biomass,
grain_wt=df$grain_wt,
wet_yield=df$wet_yield),
list(Title=df$Title, Date=df$Date, year=df$year),
FUN=median, na.rm=TRUE)
使用aggregate()考虑以下基本R解决方案。下面使用中位数作为函数,但任何聚合都应起作用(平均值、最小值、最大值等),但NAs的处理方式将有所不同
# AGGREGATED DF
collapsedf <- aggregate(list(lai=df$lai,
biomass=df$biomass,
grain_wt=df$grain_wt,
wet_yield=df$wet_yield),
list(Title=df$Title, Date=df$Date, year=df$year),
FUN=median, na.rm=TRUE)
假设对于每个
标题/日期
组合,每列只有一条有效数据,您可以使用聚合
获得所需结果:
aggregate(. ~ Title + Date + year, data=df,
FUN=function(x) x[!is.na(x)][1], na.action=na.pass)
# Title Date year lai biomass grain_wt wet_yield
#1 HartogMild 2014-07-31 2014 3.7 2444 NA NA
#2 HartogSevere 2014-07-31 2014 2.3 1940 NA NA
#3 HartogSowN 2014-07-31 2014 4.4 2837 NA NA
#4 HartogMild 2014-08-12 2014 6.6 4525 NA NA
#5 HartogSevere 2014-08-12 2014 3.8 3167 NA NA
#6 HartogSowN 2014-08-12 2014 6.1 4511 NA NA
#7 HartogMild 2014-11-10 2014 NA 18224 0.040 8662.4
#8 HartogSevere 2014-11-10 2014 NA 18184 0.038 8537.8
#9 HartogSowN 2014-11-10 2014 NA 16116 0.041 8457.4
这使用Title+Date+year
作为分组变量,处理所有剩余的数据列
该函数只返回一条未丢失的数据-!对于每个列,每个组中的.na(x)
需要使用[1]
来确保在没有未丢失的数据段时返回NA
。例如-numeric(0)[1]
返回NA
需要
na.action=na.pass
,因为aggregate
与y~x
公式一起使用时,默认情况下会抛出所有具有na
值的行-na.action=na.ommit
是默认值。假设每个标题/日期的每列只有一条有效数据通过组合,您可以使用聚合
获得所需的结果:
aggregate(. ~ Title + Date + year, data=df,
FUN=function(x) x[!is.na(x)][1], na.action=na.pass)
# Title Date year lai biomass grain_wt wet_yield
#1 HartogMild 2014-07-31 2014 3.7 2444 NA NA
#2 HartogSevere 2014-07-31 2014 2.3 1940 NA NA
#3 HartogSowN 2014-07-31 2014 4.4 2837 NA NA
#4 HartogMild 2014-08-12 2014 6.6 4525 NA NA
#5 HartogSevere 2014-08-12 2014 3.8 3167 NA NA
#6 HartogSowN 2014-08-12 2014 6.1 4511 NA NA
#7 HartogMild 2014-11-10 2014 NA 18224 0.040 8662.4
#8 HartogSevere 2014-11-10 2014 NA 18184 0.038 8537.8
#9 HartogSowN 2014-11-10 2014 NA 16116 0.041 8457.4
这使用Title+Date+year
作为分组变量,处理所有剩余的数据列
该函数只返回一条未丢失的数据-!对于每个列,每个组中的.na(x)
需要使用[1]
来确保在没有未丢失的数据段时返回NA
。例如-numeric(0)[1]
返回NA
需要na.action=na.pass
,因为aggregate
与y~x
公式一起使用时,默认情况下将抛出所有具有na
值的行-na.action=na.ommit
是默认值。您确定要输出吗?有些价值观来自不同的日期,不是吗<4.4
的code>lai
用于HartogSowN/2014-07-31
,而所有其他值用于HartogSowN/2014-11-10
我认为您最好显示您的全部期望输出。这对我来说并不明显。特别是,你的例子对我来说毫无意义,基于子集(DF,Title==“HartogSowN”和Date==“2014-07-31”)
如果不清楚你想要什么,那么会聚合(.~Title+Date+year,data=DF,FUN=function(x)x[!is.na(x)][1],na.action=na.pass)
做吗?@thelatemail是的,谢谢,这很有效。添加它作为答案,我将选择它。你介意加几句话作为解释吗?我以前从未使用过聚合。您确定要使用该输出吗?有些价值观来自不同的日期,不是吗<4.4
的code>lai用于HartogSowN/2014-07-31
,而所有其他值用于HartogSowN/2014-11-10
我认为您最好显示您的全部期望输出。这对我来说并不明显。特别是,你的例子对我来说毫无意义,基于子集(DF,Title==“HartogSowN”和Date==“2014-07-31”)
如果不清楚你想要什么,那么会聚合(.~Title+Date+year,data=DF,FUN=function(x)x[!is.na(x)][1],na.action=na.pass)
做吗?@thelatemail是的,谢谢,这很有效。添加它作为答案,我将选择它。你介意加几句话作为解释吗?我以前没用过骨料。