R中分组数据的二次拟合_R_Model_Quadratic

R中分组数据的二次拟合

r model

R中分组数据的二次拟合,r,model,quadratic,R,Model,Quadratic,虽然我在拟合模型方面得到了很多帮助，但由于数据的组织方式，我一直遇到一个特定的问题。这是一本介绍统计数据的书，它应该表示错误样本数据，作为某种药物毫克数的函数 |-----|-------|-------|-------| | 0mg | 100mg | 200mg | 300mg | |-----|-------|-------|-------| | 25 | 16 | 6 | 8 | | 19 | 15 | 14 | 18 | | 22 | 1

虽然我在拟合模型方面得到了很多帮助，但由于数据的组织方式，我一直遇到一个特定的问题。这是一本介绍统计数据的书，它应该表示错误样本数据，作为某种药物毫克数的函数

|-----|-------|-------|-------|
| 0mg | 100mg | 200mg | 300mg |
|-----|-------|-------|-------|
| 25  |  16   |   6   |   8   |
| 19  |  15   |  14   |  18   |
| 22  |  19   |   9   |   9   |
| 15  |  11   |   5   |  10   |
| 16  |  14   |   9   |  12   |
| 20  |  23   |  11   |  13   |

数据看起来像是在C组周围下降，然后在D组上升一点，因此寻找二次拟合

我尝试了以下方法：

scores = c(25, 19, 22, 15, 16, 20,
           16, 15, 19, 11, 14, 23,
            6, 14,  9,  5,  9, 11,
            8, 18,  9, 10, 12, 13)

x_groups = rep(c(0,100, 200, 300), each = 6)
scores.quadratic = lm(scores ~ poly(x_groups, 2, raw = TRUE))

然后，我可以使用

summary（）

函数查看结果。我对

lm（）

函数以及它应该如何拟合二次函数感到困惑。我的理解是，它将取

x_组中的每个索引

并将其平方，然后使用新向量的线性拟合，但我认为这并不正确

是否有人可以提供反馈，说明如何将二次曲线拟合到我的数据中，或者如果没有这样做，请帮助我了解我的错误所在

多谢各位

让我们一步一步地看看你的思维方式。首先，你通过C组的数字来发现这个下降。最好的方法是

library(ggplot2)
library(dplyr)

scores = c(25, 19, 22, 15, 16, 20,
           16, 15, 19, 11, 14, 23,
           6, 14,  9,  5,  9, 11,
           8, 18,  9, 10, 12, 13)

x_groups = rep(c(0,100, 200, 300), each = 6)

# create dataset
d1 = data.frame(scores, x_groups) 

# calcuate average scores for each group
d2 = d1 %>% group_by(x_groups) %>% summarise(Avg = mean(scores))

# plot them
ggplot() + 
  geom_point(data = d1, aes(x_groups, scores)) +
  geom_line(data = d2, aes(x_groups, Avg), col="blue")

现在你可以看到倾斜，这就是你想要建模的模式

然后，你想拟合你的二次模型。请记住，二次型是多项式公式的一种特殊情况，但其阶数为2。变量x的阶数=n的多项式拟合将拟合

intercept+x+x^2+x^3+…+x^n

。因此，二次型将拟合

截距+x+x^2

，这正是您在模型输出中得到的系数：

scores.quadratic = lm(scores ~ poly(x_groups, 2, raw = TRUE))
summary(scores.quadratic)

# Call:
#   lm(formula = scores ~ poly(x_groups, 2, raw = TRUE))
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -6.1250 -2.3333 -0.2083  1.8542  8.7917 
# 
# Coefficients:
#                                    Estimate Std. Error t value Pr(>|t|)    
#   (Intercept)                    20.2083333  1.5925328  12.689 2.58e-11 ***
#   poly(x_groups, 2, raw = TRUE)1 -0.0745833  0.0255747  -2.916  0.00825 ** 
#   poly(x_groups, 2, raw = TRUE)2  0.0001458  0.0000817   1.785  0.08870 .  
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 4.002 on 21 degrees of freedom
# Multiple R-squared:  0.4999,  Adjusted R-squared:  0.4523 
# F-statistic:  10.5 on 2 and 21 DF,  p-value: 0.0006919

二次项的系数为

0.0001458

，接近于零，但在0.1水平上与零有统计学显著差异（p值=

0.08870

）。因此，模型感觉有一种下降

可以按如下方式绘制拟合：

# plot the model
ggplot(d1, aes(x_groups, scores)) + 
  geom_point() +
  geom_smooth(formula = y ~ poly(x, 2, raw = TRUE),
              method = "lm")

您可以将其视为真实图案的平滑版本（第一个绘图）

让我们一步一步地回顾一下你的思维方式。首先，你通过C组的数字来发现这个下降。最好的方法是

library(ggplot2)
library(dplyr)

scores = c(25, 19, 22, 15, 16, 20,
           16, 15, 19, 11, 14, 23,
           6, 14,  9,  5,  9, 11,
           8, 18,  9, 10, 12, 13)

x_groups = rep(c(0,100, 200, 300), each = 6)

# create dataset
d1 = data.frame(scores, x_groups) 

# calcuate average scores for each group
d2 = d1 %>% group_by(x_groups) %>% summarise(Avg = mean(scores))

# plot them
ggplot() + 
  geom_point(data = d1, aes(x_groups, scores)) +
  geom_line(data = d2, aes(x_groups, Avg), col="blue")

现在你可以看到倾斜，这就是你想要建模的模式

然后，你想拟合你的二次模型。请记住，二次型是多项式公式的一种特殊情况，但其阶数为2。变量x的阶数=n的多项式拟合将拟合

intercept+x+x^2+x^3+…+x^n

。因此，二次型将拟合

截距+x+x^2

，这正是您在模型输出中得到的系数：

scores.quadratic = lm(scores ~ poly(x_groups, 2, raw = TRUE))
summary(scores.quadratic)

# Call:
#   lm(formula = scores ~ poly(x_groups, 2, raw = TRUE))
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -6.1250 -2.3333 -0.2083  1.8542  8.7917 
# 
# Coefficients:
#                                    Estimate Std. Error t value Pr(>|t|)    
#   (Intercept)                    20.2083333  1.5925328  12.689 2.58e-11 ***
#   poly(x_groups, 2, raw = TRUE)1 -0.0745833  0.0255747  -2.916  0.00825 ** 
#   poly(x_groups, 2, raw = TRUE)2  0.0001458  0.0000817   1.785  0.08870 .  
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 4.002 on 21 degrees of freedom
# Multiple R-squared:  0.4999,  Adjusted R-squared:  0.4523 
# F-statistic:  10.5 on 2 and 21 DF,  p-value: 0.0006919

二次项的系数为

0.0001458

，接近于零，但在0.1水平上与零有统计学显著差异（p值=

0.08870

）。因此，模型感觉有一种下降

可以按如下方式绘制拟合：

# plot the model
ggplot(d1, aes(x_groups, scores)) + 
  geom_point() +
  geom_smooth(formula = y ~ poly(x, 2, raw = TRUE),
              method = "lm")

您可以将其视为真实图案的平滑版本（第一个绘图）

二次型是多项式公式的一种特殊情况，但其阶数为2。变量

的阶数=n的多项式拟合将拟合

截距+x+x^2+x^3+…+x^n

。因此，二次型将拟合

截距+x+x^2

，这正是您在模型输出中得到的系数。看起来您希望它是

intercept+x^2

。二次型是多项式公式的一种特殊情况，但它的阶数为2。变量

的阶数=n的多项式拟合将拟合

截距+x+x^2+x^3+…+x^n

。因此，二次型将拟合

截距+x+x^2

，这正是您在模型输出中得到的系数。看起来您希望它是

intercept+x^2

。