如何为每个行业年运行线性回归模型,不包括R中的公司i观察值?

如何为每个行业年运行线性回归模型,不包括R中的公司i观察值?,r,dataframe,regression,linear-regression,panel-data,R,Dataframe,Regression,Linear Regression,Panel Data,这是我的数据集在R中的dput输出…… data1<-structure(list(Year = c(1998, 1999, 1999, 2000, 1996, 2001, 1998, 1999, 2002, 1998, 2005, 1998, 1999, 1998, 1997, 1998, 2000), `Firm name` = c("A", "A", "B", "B", "C", "C", "D", "D", "D", "E", "E", "F", "F", "

这是我的数据集在R中的dput输出……

data1<-structure(list(Year = c(1998, 1999, 1999, 2000, 1996, 2001, 1998, 
1999, 2002, 1998, 2005, 1998, 1999, 1998, 1997, 1998, 2000), 
    `Firm name` = c("A", "A", "B", "B", "C", "C", "D", "D", "D", 
    "E", "E", "F", "F", "G", "G", "H", "H"), Industry = c("AUTO", 
    "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", 
    "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", 
    "Pharma", "Pharma"), X = c(1, 2, 5, 6, 7, 9, 10, 11, 12, 
    13, 15, 16, 17, 18, 19, 20, 21), Y = c(30, 31, 34, 35, 36, 
    38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
    29, 47, 53, 59, 71, 77, 83, 89, 95, 107, 113, 119, 125, 131, 
    137, 143)), row.names = c(NA, -17L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
29, 35, 41, 47, 53, 59, 65, 71, 77, 83, 89, 95, 101, 107, 113, 
119, 125, 131, 137, 143)), row.names = c(NA, -21L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))

data1显示的代码不好读。但根据您所写的,我建议使用嵌套循环,例如:

for(y in year){
    for(comp in FirmName){
      # transform data : select only companys in this industry, but exclude comp
       lm(..)
     }
 }

正如@bonedi所提到的,您可以使用嵌套循环来实现这一点。如果要为各个行业年份组合创建模型,则需要按
行业
年份
对数据进行子集划分。您可以在创建模型之前循环查看
公司名称
,并排除该公司。结果可以存储在一个列表中,由行业年度公司命名。这不是一个很好的解决方案,但它应该让你更接近

lst <- list()

for (ind in unique(data1$Industry)) {
  for (year in unique(data1[data1$Industry == ind, ]$Year)) {
    for (firm in unique(data1[data1$Industry == ind & data1$Year == year, ]$`Firm name`)) {
      sub_data <- data1[data1$Industry == ind & data1$Year == year & data1$`Firm name` != firm, ]
      if (nrow(sub_data) > 0) {
        name <- paste(ind, year, firm, sep = '-')
        lst[[name]] <- lm(Y ~ X + Z, data = sub_data)
      }
    }
  }
}

lst对于多重方程,会有一个很好的tidyverse方法。首先,循环可以生成n个数据帧,其中一列说明排除了哪个公司,每个数据帧排除了公司i,然后将行绑定到一个数据帧,并传递给tidyverse multiple equations代码。另一个好处是,数据和模型将为ggplot()做好准备。@Ben…感谢您的帮助。我能够使用嵌套循环。唯一的问题是我的数据集有点重&使用3个嵌套循环需要花费大量时间来执行。那么,有没有一种方法可以使上述嵌套循环代码更加高效或健壮?您是否研究过更快的
lm
?见此帖: