利用splinefun的R-群_
我试图将我的数据按年份和CountyID分组,然后对子集数据使用splinefun(三次样条插值)。我对想法开放,但是夹板是必须的,不能改变 以下是我尝试使用的代码:利用splinefun的R-群_,r,group-by,spline,dplyr,cubic-spline,R,Group By,Spline,Dplyr,Cubic Spline,我试图将我的数据按年份和CountyID分组,然后对子集数据使用splinefun(三次样条插值)。我对想法开放,但是夹板是必须的,不能改变 以下是我尝试使用的代码: age <- seq(from = 0, by = 5, length.out = 18) TOT_POP <- df %.% group_by(unique(df$Year), unique(df$CountyID) %.% splinefun(age, c(0, cumsum(df$TOT_POP)), meth
age <- seq(from = 0, by = 5, length.out = 18)
TOT_POP <- df %.%
group_by(unique(df$Year), unique(df$CountyID) %.%
splinefun(age, c(0, cumsum(df$TOT_POP)), method = "hyman")
我现在做的是将年龄分组为1:17,并将分组分为0-84岁。目前,每个小组都有5年的代表性。splinefun允许我这样做,同时为过程提供了一定程度的数学严谨性,即splinefun允许我提供美国每个县每一岁的人口总数
最后,splinefun代码本身可以工作,但在group_by函数中它不能工作,它产生:
Error: wrong result size(4), expected 68 or 1.
splinefun代码的使用方式是这样的
TOT_POP <- splinefun(age, c(0, cumsum(df$TOT_POP)),
method = "hyman")
TOT_POP = pmax(0, diff(TOT_POP(c(0:85))))
TOT_POP首先,我认为我在试图实现我的道歉时使用了错误的措辞;实际上,小组并不打算解决这个问题。但是,我使用两个函数和ddply解决了这个问题。以下是解决此问题的代码:
interpolate <- function(x, ageVector){
result <- splinefun(ageVector,
c(0, cumsum(x)), method = "hyman")
diff(result(c(0:85)))
}
mainFunc <- function(df){
age <- seq(from = 0, by = 5, length.out = 18)
colNames <- setdiff(colnames(df)
c("Year","CountyID","AgeGrp"))
colWiseSpline <- colwise(interpolate, .cols = true,
age)(df[ , colNames])
cbind(data.frame(
Year = df$Year[1],
County = df$CountyID[1],
Agegrp = 0:84
),
colWiseSpline
)
}
CompleteMainRaw <- ddply(.data = df,
.variables = .(CountyID, Year),
.fun = mainFunc)
interpolate让我直说吧。您希望根据两个变量分割数据帧。然后,对于每个较小的数据帧,您希望使用splinefun
获得样条函数映射age
到TOT_POP
?然后您想使用该函数插值0到85岁之间所有年龄段的总人口,因为您的原始数据仅包含5、10、15、20岁的人口。。。?也许你可以通过split
和lappy
或plyr
来实现这一点,然后有人会更好地准备帮助你使用dplyr
。splinefun将用于一个县一年的年龄为1:17的数据子集。最后的年龄组是个人年龄0:84。我仍然很困惑。也许您可以更改df
数据帧?将Agegrp
替换为关联的age
。例如,df$Agegrp=df$Agegrp*5
<代码>colnames(df)[3]=“年龄”
。这可能会简化你的问题。
# Reproducible data set
set.seed(22)
df = data.frame( CountyID = rep(1001:1005,each = 100),
Year = rep(2001:2010, each = 10),
Agegrp = sample(1:17, 500, replace=TRUE),
TOT_POP = rnorm(500, 10000, 2000))
# Convert Agegrp to age
df$Agegrp = df$Agegrp*5
colnames(df)[3] = "age"
# Make a spline function for every CountyID-Year combination
split.dfs = split(df, interaction(df$CountyID, df$Year))
spline.funs = lapply(split.dfs, function(x) splinefun(x[,"age"], x[,"TOT_POP"]))
# Use the spline functions to interpolate populations for all years between 0 and 85
new.split.dfs = list()
for( i in 1:length(split.dfs)) {
new.split.dfs[[i]] = data.frame( CountyID=split.dfs[[i]]$CountyID[1],
Year=split.dfs[[i]]$Year[1],
age=0:85,
TOT_POP=spline.funs[[i]](0:85))
}
# Does this do what you want? If so, then it will be
# easier for others to work from here
# > head(new.split.dfs[[1]])
# CountyID Year age TOT_POP
# 1 1001 2001 0 909033.4
# 2 1001 2001 1 833999.8
# 3 1001 2001 2 763181.8
# 4 1001 2001 3 696460.2
# 5 1001 2001 4 633716.0
# 6 1001 2001 5 574829.9
# > tail(new.split.dfs[[2]])
# CountyID Year age TOT_POP
# 81 1002 2001 80 10201.693
# 82 1002 2001 81 9529.030
# 83 1002 2001 82 8768.306
# 84 1002 2001 83 7916.070
# 85 1002 2001 84 6968.874
# 86 1002 2001 85 5923.268
interpolate <- function(x, ageVector){
result <- splinefun(ageVector,
c(0, cumsum(x)), method = "hyman")
diff(result(c(0:85)))
}
mainFunc <- function(df){
age <- seq(from = 0, by = 5, length.out = 18)
colNames <- setdiff(colnames(df)
c("Year","CountyID","AgeGrp"))
colWiseSpline <- colwise(interpolate, .cols = true,
age)(df[ , colNames])
cbind(data.frame(
Year = df$Year[1],
County = df$CountyID[1],
Agegrp = 0:84
),
colWiseSpline
)
}
CompleteMainRaw <- ddply(.data = df,
.variables = .(CountyID, Year),
.fun = mainFunc)