R 插入缺失数据的行并插值

R 插入缺失数据的行并插值,r,dataframe,time-series,interpolation,missing-data,R,Dataframe,Time Series,Interpolation,Missing Data,我在R中有以下数据帧: Date Accumulated 1 2016-10-01 6902000 2 2016-11-01 9033000 3 2017-06-01 15033000 4 2017-11-01 24033000 5 2019-05-01 24533000 6 2019-08-01 25033000 7 2019-11-01 27533000 8 2020-06-01 29033000 我有兴趣在“数据”列中填

我在R中有以下数据帧:

        Date Accumulated
1 2016-10-01     6902000
2 2016-11-01     9033000
3 2017-06-01    15033000
4 2017-11-01    24033000
5 2019-05-01    24533000
6 2019-08-01    25033000
7 2019-11-01    27533000
8 2020-06-01    29033000
我有兴趣在“数据”列中填写缺失月份的行,同时在“累计”列中应用线性或样条插值(最好是样条插值)(即,我需要2016-12-01、2017-01-01、2017-02-01、2017-03-01等行)

我看到了另一个问题,人们建议使用“zoo”和“data.table”包,首先用“NA”创建行,然后应用插值。。。但我不知道如何做到这一点,因为我的数据组织方式不同(我所有的日期数据都在一列中,与此相反,例如:)。然而,我对R还是比较陌生,管理不同类型和类别的数据对我来说非常困难。我相信有一个简单的方法可以做到这一点


非常感谢。

这有助于使用样条曲线:

library(zoo)

#Data
df <- structure(list(Date = structure(c(17075, 17106, 17318, 17471, 
18017, 18109, 18201, 18414), class = "Date"), Accumulated = c(6902000L, 
9033000L, 15033000L, 24033000L, 24533000L, 25033000L, 27533000L, 
29033000L)), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "data.frame")

#Create seq of dates
df$Date <- as.Date(df$Date)
dfm <- data.frame(Date=seq(min(df$Date),max(df$Date),by='1 month'))
#Now merge
dfmerged <- merge(dfm,df,by = 'Date',all.x=T)
#Now add interpolation
dfmerged$Interpolation <- na.spline(dfmerged$Accumulated)

您可以从底部R尝试
样条曲线
,如下所示

xout <- seq(as.Date("2016-10-01"), as.Date("2020-06-01"), by = "1 month")
yout <- with(df, spline(Date, Accumulated, xout = xout)$y)
setNames(data.frame(xout,yout),names(df))
数据

df <- structure(list(Date = structure(c(17075, 17106, 17318, 17471, 
18017, 18109, 18201, 18414), class = "Date"), Accumulated = c(6902000L,
9033000L, 15033000L, 24033000L, 24533000L, 25033000L, 27533000L,
29033000L)), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "data.frame")
df1 <- read.table(text = "
        Date Accumulated
1 2016-10-01     6902000
2 2016-11-01     9033000
3 2017-06-01    15033000
4 2017-11-01    24033000
5 2019-05-01    24533000
6 2019-08-01    25033000
7 2019-11-01    27533000
8 2020-06-01    29033000
", header = TRUE)

df以下基本R解决方案使用
approxfun
创建插值函数

df1$Date <- as.Date(df1$Date)

f <- approxfun(df1$Date, df1$Accumulated)
d <- seq(min(df1$Date), max(df1$Date), by = "month")
df2 <- data.frame(Date = d, Accumulated = f(d))

编辑 下面是一个使用
splinefun
的解决方案

g <- splinefun(df1$Date, df1$Accumulated)
d <- seq(min(df1$Date), max(df1$Date), by = "month")
df3 <- data.frame(Date = d, Accumulated = g(d))

library(ggplot2)

ggplot(df3, aes(Date, Accumulated)) +
  geom_point() +
  geom_line() +
  geom_point(data = df1, aes(Date, Accumulated), colour = "blue")

g非优选的
approx
方法应该是这样的:
od可能
splinefun
应该是OP的首选:)谢谢!正是我想要的。感谢ThomasIsCoding和@Rui Barradas的回答!他们帮助我更好地理解你的代码和对比的方法。@caproki真棒!!当然,所有答案都有助于您理解插值:)
library(ggplot2)

ggplot(df2, aes(Date, Accumulated)) +
  geom_point() +
  geom_line() +
  geom_point(data = df1, aes(Date, Accumulated), colour = "blue")
g <- splinefun(df1$Date, df1$Accumulated)
d <- seq(min(df1$Date), max(df1$Date), by = "month")
df3 <- data.frame(Date = d, Accumulated = g(d))

library(ggplot2)

ggplot(df3, aes(Date, Accumulated)) +
  geom_point() +
  geom_line() +
  geom_point(data = df1, aes(Date, Accumulated), colour = "blue")
df1 <- read.table(text = "
        Date Accumulated
1 2016-10-01     6902000
2 2016-11-01     9033000
3 2017-06-01    15033000
4 2017-11-01    24033000
5 2019-05-01    24533000
6 2019-08-01    25033000
7 2019-11-01    27533000
8 2020-06-01    29033000
", header = TRUE)