如何在R中编写一个函数来实现;“最佳子集”;模型选择方法?

如何在R中编写一个函数来实现;“最佳子集”;模型选择方法?,r,dataframe,tibble,R,Dataframe,Tibble,所以我需要写一个函数,以数据帧作为输入。这些列是我的解释变量(最后一列/最右边的一列是响应变量除外)。我试图拟合一个线性模型,并跟踪每个模型的调整r平方,作为选择最佳模型的标准 模型将使用所有列作为解释变量(最右边的列除外,该列将是响应变量) 该函数应该创建一个TIBLE,其中包含一列模型编号(我不知道这是什么意思)、解释变量子集以及响应变量、模型公式、拟合线性模型的结果,以及其他需要的内容 该函数应该输出:模型编号、模型中的解释变量、调整后的r平方的值和一个图形(我可以自己绘制图形)。我这里有

所以我需要写一个函数,以数据帧作为输入。这些列是我的解释变量(最后一列/最右边的一列是响应变量除外)。我试图拟合一个线性模型,并跟踪每个模型的调整r平方,作为选择最佳模型的标准

模型将使用所有列作为解释变量(最右边的列除外,该列将是响应变量)

该函数应该创建一个TIBLE,其中包含一列模型编号(我不知道这是什么意思)、解释变量子集以及响应变量、模型公式、拟合线性模型的结果,以及其他需要的内容

该函数应该输出:模型编号、模型中的解释变量、调整后的r平方的值和一个图形(我可以自己绘制图形)。我这里有一张表格的图片,可以帮助可视化结果

我发现这段代码将为我提供解释变量和响应变量:

  cols <- colnames(data)
  # Get the response variable.
  y <- tail(cols, 1)
  # Get a list of the explanatory variables.
  xs <- head(cols, length(cols) - 1)

cols我不完全理解你的描述,但我想我理解你的目标。也许这能在某种程度上有所帮助

library(tidyverse)
library(broom)
library(data.table)

lm_func <- function(df){
  fit1 <- lm(df[, 1] ~ df[, 2], data = df)
  fit2 <- lm(df[, 1] ~ df[, 3], data = df)
  fit3 <- lm(df[, 1] ~ df[, 2], df[, 3], data = df)
  results <- list(fit1, fit2, fit3)
  names(results) <- paste0("explanitory_variables_", 1:3)
  r_sq <- lapply(results, function(x){
    glance(x)
  })
  r_sq_df <- rbindlist(r_sq, idcol = "df_name")
  r_sq_df

}
lm_func(iris)
这假设您的第一列是y,并且您希望为每个附加列创建一个模型

好的,还有一个更新: 我认为这一切都是可能的,但可能有些过分:

library(combinat)
library(data.table)
library(tidyverse)
library(broom)

#First function takes a dataframe containing only the dependent and independent variables. Specify them by variable name or column position.
#The function then returns a list of dataframes of every possible order of independent variables (y ~ x1 + x2...) (y ~ x2 + x1...).
#So you can run your model on every possible sequence of explanatory variables
formula_func <- function(df, dependent = df["Sepal.Length"], independents = df[c("Sepal.Width", "Petal.Length", "Petal.Width", "Species")]) {
  independents_df_list <- permn(independents) #length of output should be the factorial of the number of independent variables
  df_list <- lapply(independents_df_list, function(x){ #this just pastes your independent variable as the first column of each df
    cbind(dependent, x)
  })
  df_list
}
permd_df_list <- formula_func(iris) # voila

# This function takes the output from the previous function and runs the lm building in one variable each time (y ~ x1), (y ~ x1 + x2) and so on
# So the result is many lms building in one one independent variable at a time in every possible order
# If that is as confusing to you as it is to me then check final output. You will see what model formula is used per row and in what order each explanatory variable was added
lm_func <- function(form_df_list, df) {
 mega_lst <- c()
 mega_lst <-  lapply(form_df_list, function(x) {
   lst <- vector(mode = "list", length = length(2:ncol(x)))
   for (i in 2:ncol(x)) {
      ind <- i
      form_df <- x[, 1:ind]
      form <- DF2formula(form_df)
      fit <- lm(form, data = x)
      lst[[i - 1]] <- glance(fit)
      names(lst)[[i-1]] <- deparse(form)
    }
   lst <- rbindlist(lst, idcol = "Model_formula")
   return(lst)
   })
 return(mega_lst)
}
everything_list <- lm_func(permd_df_list, iris) # VOILA!!!
#Remove duplicates and return single df
everything_list_distinct <- everything_list %>% 
  rbindlist() %>% 
  distinct()


## You can now subset and select whichever column you want from the final output
库(combinat)
库(数据表)
图书馆(tidyverse)
图书馆(扫帚)
#第一个函数获取一个仅包含因变量和自变量的数据帧。通过变量名或列位置指定它们。
#然后,该函数返回自变量(y~x1+x2…(y~x2+x1…)的每个可能顺序的数据帧列表。
#因此,您可以在每个可能的解释变量序列上运行您的模型

非常感谢你的回答。我会在你的帮助下继续努力,看看这是否能让我更接近我所需要的。我知道这个模型没有多大意义,但不幸的是我没有选择它。请随时更新您的问题。现在上面有一个更新的答案,但是如果你是新手,那么理解代码就太可怕了!可能可以提供一个更简单的解决方案:DI让您的解决方案发挥作用!我会认为你的答案是正确的。显然,我必须根据我的具体情况稍微更改源代码。对不起,这个问题太难理解了。我试着尽可能地描述它,而不必把它放在上下文中。非常感谢。令人惊叹的。如果使用输出所有内容的函数,则可以过滤唯一行。
library(tidyverse)
library(broom)
library(data.table)

lm_func <- function(df){
  fit1 <- lm(df[, 1] ~ df[, 2], data = df)
  fit2 <- lm(df[, 1] ~ df[, 3], data = df)
  fit3 <- lm(df[, 1] ~ df[, 2], df[, 3], data = df)
  results <- list(fit1, fit2, fit3)
  names(results) <- paste0("explanitory_variables_", 1:3)
  r_sq <- lapply(results, function(x){
    glance(x)
  })
  r_sq_df <- rbindlist(r_sq, idcol = "df_name")
  r_sq_df

}
lm_func(iris)
lm_func <- function(df) {
  lst <- c()
  for (i in 2:ncol(df)) {
    ind <- i
    form_df <- df[, 1:ind]
    form <- DF2formula(form_df)
    fit <- lm(form, data = df)
    lst[[i - 1]] <- glance(fit)
  }
  lst
  names(lst) <- paste0("explanitory_variables_", 1:length(lst))
  lst <- rbindlist(lst, idcol = "df_name")
  lst
}
lm_func(iris)
library(combinat)
library(data.table)
library(tidyverse)
library(broom)

#First function takes a dataframe containing only the dependent and independent variables. Specify them by variable name or column position.
#The function then returns a list of dataframes of every possible order of independent variables (y ~ x1 + x2...) (y ~ x2 + x1...).
#So you can run your model on every possible sequence of explanatory variables
formula_func <- function(df, dependent = df["Sepal.Length"], independents = df[c("Sepal.Width", "Petal.Length", "Petal.Width", "Species")]) {
  independents_df_list <- permn(independents) #length of output should be the factorial of the number of independent variables
  df_list <- lapply(independents_df_list, function(x){ #this just pastes your independent variable as the first column of each df
    cbind(dependent, x)
  })
  df_list
}
permd_df_list <- formula_func(iris) # voila

# This function takes the output from the previous function and runs the lm building in one variable each time (y ~ x1), (y ~ x1 + x2) and so on
# So the result is many lms building in one one independent variable at a time in every possible order
# If that is as confusing to you as it is to me then check final output. You will see what model formula is used per row and in what order each explanatory variable was added
lm_func <- function(form_df_list, df) {
 mega_lst <- c()
 mega_lst <-  lapply(form_df_list, function(x) {
   lst <- vector(mode = "list", length = length(2:ncol(x)))
   for (i in 2:ncol(x)) {
      ind <- i
      form_df <- x[, 1:ind]
      form <- DF2formula(form_df)
      fit <- lm(form, data = x)
      lst[[i - 1]] <- glance(fit)
      names(lst)[[i-1]] <- deparse(form)
    }
   lst <- rbindlist(lst, idcol = "Model_formula")
   return(lst)
   })
 return(mega_lst)
}
everything_list <- lm_func(permd_df_list, iris) # VOILA!!!
#Remove duplicates and return single df
everything_list_distinct <- everything_list %>% 
  rbindlist() %>% 
  distinct()


## You can now subset and select whichever column you want from the final output