R 回归循环中的国家年固定效应（只有一个水平）_R_Datatable_Lapply_Lm

R 回归循环中的国家年固定效应（只有一个水平）

R 回归循环中的国家年固定效应（只有一个水平）,r,datatable,lapply,lm,R,Datatable,Lapply,Lm,我使用了以下代码来运行回归： res <- lm (c241 ~ x + matchcode , data = df) summary(res) dfoutput<- data.table(df)[, .(nm = names(.SD),dffits= lapply(.SD, function(x) if(is.numeric(x)) summary(lm(c241~ x + matchcode, na.action=na.omit)))), .SDcols = -1] 我想循环

我使用了以下代码来运行回归：

res <- lm (c241 ~ x + matchcode , data = df)
summary(res)

dfoutput<- data.table(df)[, .(nm = names(.SD),dffits= lapply(.SD, function(x) if(is.numeric(x)) summary(lm(c241~ x + matchcode, na.action=na.omit)))), .SDcols = -1]

我想循环遍历变量，如下所示

或者我将lappy与

matchcode

一起使用：

df <- lapply( df[,-1], function(x) summary(lm(df$c241~ x + df$matchcode)) )

据我所知，您尝试过进行特征选择，但选择了一种复杂的方法，它使您的输出

NULL

。我同意，你有这么多的特征，你需要在回归之前进行特征选择

有非常突出的特征选择方法，如随机森林。随机林帮助您检测最佳预测值

考虑到我对预测植物种类很感兴趣，但我不知道哪种特征可以预测得最好（

Sepal.Length

，

Sepal.Width

，

Petal.Length

，

Petal.Width

）。因此，下面的代码指定了最佳预编码器：

library(party)
colnames(iris)
cf1 <- cforest(Species ~ . , data= iris, control=cforest_unbiased(mtry=2,ntree=50))

varimp(cf1)

在

lm（y~x+matchcode，data=df）

中，您应该有

和

，但我在您的dataset@Salman我使用

来简化示例（c241将是实际的y）。

只是函数

。它应用循环中的每个变量。很遗憾，我还没能检验你的答案。我将在早上做的第一件事就是这样做。我用一个回归进行了尝试，得到了所需的输出（使用所有不同的匹配代码）。我知道Lappy失败了，这也是data.table第1行和第4行生成NULL的原因。我只是想知道如何解决这个问题。谢谢你的回答！它起作用了！这并不理想，因为它迫使我首先对数据进行子集划分（不允许响应变量中缺少值），这在我的情况下会造成工作区内存问题。是否可以告诉函数只选择函数中的非NA值，这样我就不必子集？此外，您能否详细介绍一下control=cforest_unbiased（mtry=2，ntree=50）？@TomKisters我很高兴它能帮助您。关于

NA

s，您应该在开始做任何事情之前清理数据。卸下或更换NAs也是这一步骤的一部分。在理想情况下，是的。但我在每个数据库中都有多个遵从性变量。除了lm函数和其他许多函数外，处理NA没有任何问题。在我的项目中，删除NA不是很理想。@TomKisters也应该有方法来替换

NA

s。

df <- lapply( df[,-1], function(x) summary(lm(df$c241~ x + df$matchcode)) )

 Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

 ES1sample <- dput(head(ES1sample[, ],10))
structure(list(matchcode = structure(c("BGD 2002 ", "BRA 2003 ", 
"KHM 2003 ", "CHN 2002 ", "CHN 2003 ", "ECU 2003 ", "ERI 2002 ", 
"ETH 2002 ", "GTM 2003 ", "HND 2003 "), label = "", class = c("labelled", 
"character")), idstd = structure(c(2760, 14273, 16104, 17039, 
19095, 22207, 23063, 24046, 25420, 26212), label = "WEB STD FIRMID", format.stata = "%5.0f", class = c("labelled", 
"numeric")), year = structure(c(2002, 2003, 2003, 2002, 2003, 
2003, 2002, 2002, 2003, 2003), format.stata = "%9.0g", label = "", class = c("labelled", 
"numeric")), country = structure(c("Bangladesh2002", "Brazil2003", 
"Cambodia2003", "China2002", "China2003", "Ecuador2003", "Eritrea2002", 
"Ethiopia2002", "Guatemala2003", "Honduras2003"), label = "Country", format.stata = "%21s", class = c("labelled", 
"character")), wt = structure(c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), label = "locations and sectors weights", format.stata = "%9.0g", class = c("labelled", 
"numeric")), region = structure(c(6, 4, 2, 2, 2, 4, 1, 1, 4, 
4), label = "", class = c("labelled", "numeric")), income = structure(c(1, 
2, 1, 2, 2, 2, 1, 1, 2, 2), label = "income grouping for survey year", class = c("labelled", 
"numeric")), industry = structure(c(1, 1, 20, 3, 20, 12, 7, 3, 
NA, 3), label = "Industry", class = c("labelled", "numeric")), 
    sector = structure(c(1, 1, 2, 1, 2, 1, 1, 1, 2, 1), label = "Sector", class = c("labelled", 
    "numeric")), ownership = structure(c(2, 2, 2, 2, 2, 2, 2, 
    2, 1, 1), label = "Ownership", class = c("labelled", "numeric"
    )), exporter = structure(c(2, 2, 2, 2, 2, 1, 2, 2, 2, 1), label = "Export", class = c("labelled", 
    "numeric")), c201 = structure(c(1991, 1993, 1999, 1979, 1998, 
    1997, 1996, 1998, 1990, 1998), label = "Year firm began operations in this country", format.stata = "%4.0f", class = c("labelled", 
    "numeric")), c202 = structure(c(2, 2, 4, 6, 6, 2, 6, NA, 
    NA, NA), label = "Current legal status of firm", class = c("labelled", 
    "numeric")), c203a = structure(c(100, 100, 100, 100, 0, 100, 
    0, 100, 0, 0), label = "Percentage of firm owned by domestic private sector", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c203b = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 
    100, 100), label = "Percentage of firm owned by foreign private sector", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c203c = structure(c(0, 0, 0, 0, 100, 0, 0, 0, 
    0, 0), label = "Percentage of firm owned by government/state", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c203d = structure(c(NA, 0, 0, NA, NA, 0, 100, 
    0, 0, 0), label = "Percentage of firm owned by other types of owner", format.stata = "%8.2f", class = c("labelled", 
    "numeric")), c2041 = structure(c(2, 2, 2, NA, NA, 2, 2, 2, 
    2, 2), label = "Firm previously owned by government?", class = c("labelled", 
    "numeric")), c2042 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Year of privatization", format.stata = "%4.0f", class = c("labelled", 
    "numeric")), c205a = structure(c(NA, 25, 100, NA, 100, 100, 
    NA, NA, 100, 100), label = "Percentage owned by largest shareholder", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c205b1 = structure(c(NA, NA, NA, NA, NA, NA, 
    NA, NA, 1, 1), label = "Largest shareholder: individual", class = c("labelled", 
    "numeric")), c205b2 = structure(c(NA, 1, 1, NA, NA, 1, NA, 
    NA, NA, NA), label = "Largest shareholder: family", class = c("labelled", 
    "numeric")), c205b3 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: domestic company", class = c("labelled", 
    "numeric")), c205b4 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: foreign company", class = c("labelled", 
    "numeric")), c205b5 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: bank", class = c("labelled", 
    "numeric")), c205b6 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: investment fund", class = c("labelled", 
    "numeric")), c205b7 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: firm managers", class = c("labelled", 
    "numeric")), c205b8 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: firm employees", class = c("labelled", 
    "numeric")), c205b9 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: government", class = c("labelled", 
    "numeric")), c205b10 = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), label = "Largest shareholder: other", class = c("labelled", 
    "numeric")), c205c = structure(c(1, 1, 1, NA, 2, 1, 1, 1, 
    1, 1), label = "Is principal shareholder also the manager/director?", class = c("labelled", 
    "numeric")), c205d = structure(c(1, 1, 1, NA, NA, 2, 1, 1, 
    1, 1), label = "Is the principal owner male?", class = c("labelled", 
    "numeric")), c206a = structure(c(1, 1, 1, NA, NA, 1, 0, 1, 
    2, 3), label = "Number of separate operating facilities in this country", format.stata = "%4.0f", class = c("labelled", 
    "numeric")), c206b = structure(c(NA, 2, 2, NA, NA, 2, 2, 
    2, 2, 1), label = "Operations in other countries?", class = c("labelled", 
    "numeric")), c2071 = structure(c(1, 5, 1, 1, 2, 1, 5, 1, 
    1, 3), label = "Location of establishment", class = c("labelled", 
    "numeric")), c2072 = structure(c(NA, NA, NA, NA, NA, 1, 5, 
    NA, 1, 3), label = "Location of headquarters", class = c("labelled", 
    "numeric")), c208 = structure(c(NA, 1723, NA, NA, NA, NA, 
    NA, NA, NA, 1810), label = "Main product line", format.stata = "%10.0g", class = c("labelled", 
    "numeric")), c209a = structure(c(NA, 1, 2, NA, NA, NA, NA, 
    NA, NA, NA), label = "Other income generating activities", class = c("labelled", 
    "numeric")), c209ba = structure(c(NA, 0, NA, NA, NA, 100, 
    NA, NA, NA, NA), label = "Percent workers' time: manufacturing", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c209bb = structure(c(NA, 0, NA, NA, NA, 0, NA, 
    NA, NA, NA), label = "Percent workers' time: services", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c209bc = structure(c(NA, 0, NA, NA, NA, NA, 
    NA, NA, NA, NA), label = "Percent workers' time: commerce (retail/wholesale trade)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c209bd = structure(c(NA, 0, NA, NA, NA, NA, 
    NA, NA, NA, NA), label = "Percent workers' time: construction", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c209be = structure(c(NA, 1, NA, NA, NA, 0, NA, 
    NA, NA, NA), label = "Percent workers' time: other", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c210a = structure(c(NA, 50, 1, NA, NA, NA, NA, 
    NA, 65, 0), label = "Main product line: firm's share of local market", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c210b = structure(c(12, 25, 1, 6, 44.6399993896484, 
    50, NA, 5, 45, 0), label = "Main product line: firm's share of national market", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c211a1 = structure(c(100, 100, 100, 99, 100, 
    90, 100, 100, 95, 0), label = "Percent of sales sold domestically", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c211a2 = structure(c(0, 0, 0, 0, 0, 10, 0, 0, 
    5, 100), label = "Percent of sales exported directly", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c211a3 = structure(c(0, 0, 0, 1, 0, 0, NA, 0, 
    0, 0), label = "Percent of sales exported indirectly", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c211b1 = structure(c(NA, 0, 0, 0, 0, 0, NA, 
    NA, NA, NA), label = "Percentage of domestic sales to government", format.stata = "%9.2f", class = c("labelled", 
    "numeric")), c282a2y = structure(c(451645, NA, NA, 49609, 
    1061449, 21, 38966.6171875, 43.0904998779297, NA, NA), label = "Total liabilities 2 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282b2y = structure(c(NA, NA, NA, 393, 81012, 
    1, 0, NA, NA, NA), label = "Long-term liabilities (>1 year) 2 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282c2y = structure(c(NA, NA, NA, 55193, 980437, 
    20, 7687.90576171875, NA, NA, NA), label = "Short-term liabilities (<1 year) 2 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282d2y = structure(c(NA, NA, 5500, 55193, 19194, 
    NA, 0, NA, NA, NA), label = "Payable short-term liabilities 2 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282e2y = structure(c(10000, NA, NA, 5000, 20329, 
    12, 29826.701171875, 40, NA, NA), label = "Equity–share capital 2 years ago (thousands LCU)", format.stata = "%10.0g", class = c("labelled", 
    "numeric")), c282f2y = structure(c(305436, NA, 5500, 916, 
    NA, NA, 1452.00903320312, 6.09049987792969, NA, NA), label = "Retained earnings 2 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282a3y = structure(c(456618, NA, NA, NA, 1063217, 
    16, 39676.3984375, 43.7501029968262, NA, NA), label = "Total liabilities 3 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282b3y = structure(c(NA, NA, NA, NA, 152156, 
    5, 0, NA, NA, NA), label = "Long-term liabilities (>1 year) 3 years ago (thousands LCU)", format.stata = "%10.0g", class = c("labelled", 
    "numeric")), c282c3y = structure(c(NA, NA, NA, NA, 911061, 
    11, 6299.255859375, NA, NA, NA), label = "Short-term liabilities (<1 year) 3 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282d3y = structure(c(NA, NA, NA, NA, 20964, 
    NA, 0, NA, NA, NA), label = "Payable short-term liabilities 3 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), c282e3y = structure(c(10000, NA, NA, NA, 124531, 
    9, 32368.564453125, 40, NA, NA), label = "Equity–share capital 3 years ago (thousands LCU)", format.stata = "%10.0g", class = c("labelled", 
    "numeric")), c282f3y = structure(c(282840, NA, NA, NA, NA, 
    NA, 1008.58001708984, 6.75010013580322, NA, NA), label = "Retained earnings 3 years ago (thousands LCU)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), gni = structure(c(370, 2760, 300, 970, 1100, 
    1830, 150, 100, 1910, 960), label = "Gross National Income per capita, Atlas Method (current ), World Development Ind", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), pop = structure(c(135683664, 176596256, 13403644, 
    1280400000, 1288400000, 13007942, 4296700, 67217840, 12307091, 
    6968512), label = "Population, Total, in 2005 (World Development Indicators)", format.stata = "%9.0g", class = c("labelled", 
    "numeric")), country_proper = structure(c("Bangladesh", "Brazil", 
    "Cambodia", "China", "China", "Ecuador", "Eritrea", "Ethiopia", 
    "Guatemala", "Honduras"), format.stata = "%22s", label = "", class = c("labelled", 
    "character")), c_abbr = structure(c("BGD", "BRA", "KHM", 
    "CHN", "CHN", "ECU", "ERI", "ETH", "GTM", "HND"), format.stata = "%9s", label = "", class = c("labelled", 
    "character")), countryyear = structure(c("Bangladesh2002", 
    "Brazil2003", "Cambodia2003", "China2002", "China2003", "Ecuador2003", 
    "Eritrea2002", "Ethiopia2002", "Guatemala2003", "Honduras2003"
    ), label = "Country", format.stata = "%21s", class = c("labelled", 
    "character")), iso3c = structure(c("BGD", "BRA", "KHM", "CHN", 
    "CHN", "ECU", "ERI", "ETH", "GTM", "HND"), label = "", class = c("labelled", 
    "character")), cname = structure(c("Bangladesh", "Brazil", 
    "Cambodia", "China", "China", "Ecuador", "Eritrea", "Ethiopia", 
    "Guatemala", "Honduras"), label = "", class = c("labelled", 
    "character")), cyear = structure(c("2002", "2003", "2003", 
    "2002", "2003", "2003", "2002", "2002", "2003", "2003"), label = "", class = c("labelled", 
    "character"))), .Names = c("matchcode", "idstd", "year", 
"country", "wt", "region", "income", "industry", "sector", "ownership", 
"exporter", "c201", "c202", "c203a", "c203b", "c203c", "c203d", 
"c2041", "c2042", "c205a", "c205b1", "c205b2", "c205b3", "c205b4", 
"c205b5", "c205b6", "c205b7", "c205b8", "c205b9", "c205b10", 
"c205c", "c205d", "c206a", "c206b", "c2071", "c2072", "c208", 
"c209a", "c209ba", "c209bb", "c209bc", "c209bd", "c209be", "c210a", 
"c210b", "c211a1", "c211a2", "c211a3", "c211b1", "c282a2y", "c282b2y", 
"c282c2y", "c282d2y", "c282e2y", "c282f2y", "c282a3y", "c282b3y", 
"c282c3y", "c282d3y", "c282e3y", "c282f3y", "gni", "pop", "country_proper", 
"c_abbr", "countryyear", "iso3c", "cname", "cyear"), class = c("data.table", 
"data.frame"), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x0000000002570788>)

library(party)
colnames(iris)
cf1 <- cforest(Species ~ . , data= iris, control=cforest_unbiased(mtry=2,ntree=50))

varimp(cf1)

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
 0.047636364  0.002909091  0.354181818  0.227636364