使用R在数据中查找关系_R_Statistics

使用R在数据中查找关系

r statistics

使用R在数据中查找关系,r,statistics,R,Statistics,我有一份数据 df <- structure(list(salary = c(32368L, 53174L, 52722L, 53423L, 50602L, 49033L, 24395L, 24395L, 43124L, 23975L, 53174L, 58515L, 56294L, 49033L, 44884L, 53429L, 46574L, 58968L, 53174L, 53627L, 49033L, 54981L, 62530L, 27525L, 2439

我有一份数据

   df <- structure(list(salary = c(32368L, 53174L, 52722L, 53423L, 50602L, 
  49033L, 24395L, 24395L, 43124L, 23975L, 53174L, 58515L, 56294L, 
  49033L, 44884L, 53429L, 46574L, 58968L, 53174L, 53627L, 49033L, 
  54981L, 62530L, 27525L, 24395L, 56884L, 52111L, 44183L, 24967L, 
  35423L, 41188L, 27525L, 35018L, 44183L, 35423L), experience = c(3L, 
  10L, 10L, 1L, 5L, 10L, 5L, 6L, 8L, 4L, 4L, 8L, 10L, 10L, 1L, 
  5L, 8L, 10L, 5L, 10L, 5L, 7L, 10L, 3L, 5L, 10L, 5L, 5L, 6L, 4L, 
  2L, 3L, 1L, 2L, 1L)), .Names = c("salary", "experience"), class = "data.frame", row.names = c("1", 
  "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
  "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
  "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35"
  ))

我需要找到一个静态定律，它可以描述

工资

和

经验

之间的关系。我认为这是一个二次互惠，但当我打印散点图时，我没有看到这些变量之间的任何关系。我想我可以把这些数据分开，看看它们之间的关系。

但我不知道，我该怎么做。

你试过什么吗？类似于简单的

lm

plot(experience~salary, df)
mod <- lm(experience~salary, df)
abline(mod)
summary(mod)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -1.904e-01  1.801e+00  -0.106  0.91646   
salary       1.346e-04  3.931e-05   3.424  0.00167 **

绘图（经验~工资，df）
mod | t |）
（截距）-1.904e-01 1.801e+00-0.106 0.91646
工资1.346e-04 3.931e-05 3.424 0.00167**

您可以尝试以下其他型号：

mod2 <- lm(experience ~ salary + I(salary^2), df)    
new_salary <- seq(min(df$salary), max(df$salary), length=50)    
pred_experience <- predict(mod2, newdata=data.frame(salary=new_salary))    
lines(new_salary, pred_experience)

mod2欢迎使用堆栈溢出！请阅读相关信息以及如何给出建议。这将使其他人更容易帮助你。这是线性关系吗？你可以尝试使用lm（经验~I（薪水^2），df）
和lm（经验~salary+I（薪水^2），df）
，但没有明显的/可视的gainI不知道为什么，但我看到了相同的图形。奇怪的是，abline
在有很多系数的时候并不快乐。您可能需要使用预测
和行
。例如：plot（经验~工资，df）mod2
mod2 <- lm(experience ~ salary + I(salary^2), df)    
new_salary <- seq(min(df$salary), max(df$salary), length=50)    
pred_experience <- predict(mod2, newdata=data.frame(salary=new_salary))    
lines(new_salary, pred_experience)