R 多项式回归非最佳拟合生产线

R 多项式回归非最佳拟合生产线,r,correlation,polynomial-math,R,Correlation,Polynomial Math,我正试图通过在y轴和x轴的两个岩芯上绘制已知深度来创建一个多项式方程。理论上,这意味着我可以在方程中输入一个核心的深度,得到另一个核心的深度。基本上,我正在尽可能地将两个核心关联起来 然而,我发现我的输出值有很大的偏差(我输入一个知道输出的值,但它有很大的不同)。我还担心r2值过高 我的主要问题如下: 1:这个问题是因为我对统计数据缺乏了解还是代码中的错误 2:我想实现的目标可能实现吗? 3:我只需要接受看似巨大的误差 如有任何帮助或建议,将不胜感激。我已经独自与之斗争了很长时间 库(ggpl

我正试图通过在y轴和x轴的两个岩芯上绘制已知深度来创建一个多项式方程。理论上,这意味着我可以在方程中输入一个核心的深度,得到另一个核心的深度。基本上,我正在尽可能地将两个核心关联起来

然而,我发现我的输出值有很大的偏差(我输入一个知道输出的值,但它有很大的不同)。我还担心r2值过高

我的主要问题如下: 1:这个问题是因为我对统计数据缺乏了解还是代码中的错误 2:我想实现的目标可能实现吗? 3:我只需要接受看似巨大的误差

如有任何帮助或建议,将不胜感激。我已经独自与之斗争了很长时间

库(ggplot2)
图书馆(tidyverse)
图书馆(cowplot)
setwd(“/Users/jakobparrish/Dropbox/Jakob/2019/Lake Nganoke论文准备/核心工作/高光谱叶绿素'A'/图R/叶绿素A”)

嗨,杰克,请尽量把你的帖子限制在一个问题上(请参阅),并提供一个“请”字。那么帮助你就容易多了。@majid对不起!我一定会努力澄清我的疑问。相对较新,因此仍在努力找出构建一切的最佳方式:-)除非您有理由相信您的模型是四阶的,否则您应该避免高阶多项式拟合,它们在拟合的数据点之外是出了名的不稳定。只有7或8个数据点可以拟合5个术语,因此过度拟合的风险很高。我的假设是,多项式越大,准确度就越高。显然我错了。为帮助干杯
library(ggplot2)
library(tidyverse)
library(cowplot)

setwd("/Users/jakobparrish/Dropbox/Jakob/2019/Lake Nganoke-Thesis Prep/Core Work/Hyperspectral-Chlorophyl 'A'/Graphs R/Chlorophyll A")
SPEC <-read_csv("Correlations.csv")
Correlations <-read_csv("Correlations.csv")

lm_eqn <- function(df, degree, raw=TRUE){
  m <- lm(y ~ poly(x, degree, raw=raw), df)  # get the fit
  cf <- round(coef(m), 5)  # round the coefficients
  r2 <- round(summary(m)$r.squared, 5)  # round the r.squared
  powers <- paste0("^", seq(length(cf)-1))  # create the powers for the equation
  powers[1] <- ""  # remove the first one as it's redundant (x^1 = x)
  # first check the sign of the coefficient and assign +/- and paste it with
  # the appropriate *italic(x)^power. collapse the list into a string
  pcf <- paste0(ifelse(sign(cf[-1])==1, " + ", " - "), abs(cf[-1]),
                paste0("*italic(x)", powers), collapse = "")
  # paste the rest of the equation together
  eq <- paste0("italic(y) == ", cf[1], pcf, "*','", "~italic(r)^2==", r2)
  eq
}

###############################
#Plots LC1U vs LC3U
df1 <- data.frame("x"=Correlations$LC3U, "y"=Correlations$LC1U)
df1 <- na.omit(df1)

p1v3 <- ggplot(df1, aes(x = x, y = y)) +
  geom_point()+
  labs(x ='LC3U [cm]', y ='LC1U [cm]', title = 'Core Correlations of Lake Nganoke LC1U & LC3U') +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2, raw = TRUE), size = 1) +
  annotate("text", x = 10, y = 10, label = lm_eqn(df1, 2, raw = TRUE),
           hjust = 0, family = "Times", parse = TRUE) +
  scale_y_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90)) + #add limits in
  scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90)) +
  expand_limits(y=c(10,90),x=c(10,90)) +
  theme_classic()

p1v3

###############################
#Plots LC2U vs LC3U
df2 <- data.frame("x"=Correlations$LC3U, "y"=Correlations$LC2U)
df2 <- na.omit(df2)

p2v3 <- ggplot(df2, aes(x = x, y = y)) +
  geom_point()+
  labs(x ='LC3U [cm]', y ='LC2U [cm]', title = 'Core Correlations of Lake Nganoke LC2U & LC3U') +
  stat_smooth(method = "lm", formula = y ~ poly(x, 4, raw = TRUE), size = 1) +
  annotate("text", x = 10, y = 10, label = lm_eqn(df2, 4, raw = TRUE),
           hjust = 0, family = "Times", parse = TRUE) +
  scale_y_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90)) + #add limits in
  scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90)) +
  expand_limits(y=c(10,90),x=c(10,90)) +
  theme_classic() 


p2v3

#################################
#Plots all two together

P_Correlations <-plot_grid(p1v3, p2v3, labels = "AUTO")

P_Correlations