R 插入缺失数据的行并插值_R_Dataframe_Time Series_Interpolation_Missing Data

R 插入缺失数据的行并插值

r dataframe

R 插入缺失数据的行并插值,r,dataframe,time-series,interpolation,missing-data,R,Dataframe,Time Series,Interpolation,Missing Data,我在R中有以下数据帧： Date Accumulated 1 2016-10-01 6902000 2 2016-11-01 9033000 3 2017-06-01 15033000 4 2017-11-01 24033000 5 2019-05-01 24533000 6 2019-08-01 25033000 7 2019-11-01 27533000 8 2020-06-01 29033000 我有兴趣在“数据”列中填

我在R中有以下数据帧：

        Date Accumulated
1 2016-10-01     6902000
2 2016-11-01     9033000
3 2017-06-01    15033000
4 2017-11-01    24033000
5 2019-05-01    24533000
6 2019-08-01    25033000
7 2019-11-01    27533000
8 2020-06-01    29033000

我有兴趣在“数据”列中填写缺失月份的行，同时在“累计”列中应用线性或样条插值（最好是样条插值）（即，我需要2016-12-01、2017-01-01、2017-02-01、2017-03-01等行）

我看到了另一个问题，人们建议使用“zoo”和“data.table”包，首先用“NA”创建行，然后应用插值。。。但我不知道如何做到这一点，因为我的数据组织方式不同（我所有的日期数据都在一列中，与此相反，例如：）。然而，我对R还是比较陌生，管理不同类型和类别的数据对我来说非常困难。我相信有一个简单的方法可以做到这一点

非常感谢。

这有助于使用样条曲线：

library(zoo)

#Data
df <- structure(list(Date = structure(c(17075, 17106, 17318, 17471, 
18017, 18109, 18201, 18414), class = "Date"), Accumulated = c(6902000L, 
9033000L, 15033000L, 24033000L, 24533000L, 25033000L, 27533000L, 
29033000L)), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "data.frame")

#Create seq of dates
df$Date <- as.Date(df$Date)
dfm <- data.frame(Date=seq(min(df$Date),max(df$Date),by='1 month'))
#Now merge
dfmerged <- merge(dfm,df,by = 'Date',all.x=T)
#Now add interpolation
dfmerged$Interpolation <- na.spline(dfmerged$Accumulated)

您可以从底部R尝试

样条曲线，如下所示
xout <- seq(as.Date("2016-10-01"), as.Date("2020-06-01"), by = "1 month")
yout <- with(df, spline(Date, Accumulated, xout = xout)$y)
setNames(data.frame(xout,yout),names(df))

数据
df <- structure(list(Date = structure(c(17075, 17106, 17318, 17471, 
18017, 18109, 18201, 18414), class = "Date"), Accumulated = c(6902000L,
9033000L, 15033000L, 24033000L, 24533000L, 25033000L, 27533000L,
29033000L)), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "data.frame")

df1 <- read.table(text = "
        Date Accumulated
1 2016-10-01     6902000
2 2016-11-01     9033000
3 2017-06-01    15033000
4 2017-11-01    24033000
5 2019-05-01    24533000
6 2019-08-01    25033000
7 2019-11-01    27533000
8 2020-06-01    29033000
", header = TRUE)

df以下基本R解决方案使用approxfun
创建插值函数
df1$Date <- as.Date(df1$Date)

f <- approxfun(df1$Date, df1$Accumulated)
d <- seq(min(df1$Date), max(df1$Date), by = "month")
df2 <- data.frame(Date = d, Accumulated = f(d))


编辑
下面是一个使用splinefun
的解决方案
g <- splinefun(df1$Date, df1$Accumulated)
d <- seq(min(df1$Date), max(df1$Date), by = "month")
df3 <- data.frame(Date = d, Accumulated = g(d))

library(ggplot2)

ggplot(df3, aes(Date, Accumulated)) +
  geom_point() +
  geom_line() +
  geom_point(data = df1, aes(Date, Accumulated), colour = "blue")

g非优选的approx
方法应该是这样的：od可能splinefun应该是OP的首选：）谢谢！正是我想要的。感谢ThomasIsCoding和@Rui Barradas的回答！他们帮助我更好地理解你的代码和对比的方法。@caproki真棒！！当然，所有答案都有助于您理解插值：）
library(ggplot2)

ggplot(df2, aes(Date, Accumulated)) +
  geom_point() +
  geom_line() +
  geom_point(data = df1, aes(Date, Accumulated), colour = "blue")

g <- splinefun(df1$Date, df1$Accumulated)
d <- seq(min(df1$Date), max(df1$Date), by = "month")
df3 <- data.frame(Date = d, Accumulated = g(d))

library(ggplot2)

ggplot(df3, aes(Date, Accumulated)) +
  geom_point() +
  geom_line() +
  geom_point(data = df1, aes(Date, Accumulated), colour = "blue")

df1 <- read.table(text = "
        Date Accumulated
1 2016-10-01     6902000
2 2016-11-01     9033000
3 2017-06-01    15033000
4 2017-11-01    24033000
5 2019-05-01    24533000
6 2019-08-01    25033000
7 2019-11-01    27533000
8 2020-06-01    29033000
", header = TRUE)