Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R glmnet:glmnet()的lambda相同,但系数不同&;简历:glmnet()_R_Glm_Cross Validation_Glmnet - Fatal编程技术网

R glmnet:glmnet()的lambda相同,但系数不同&;简历:glmnet()

R glmnet:glmnet()的lambda相同,但系数不同&;简历:glmnet(),r,glm,cross-validation,glmnet,R,Glm,Cross Validation,Glmnet,尽管我使用的是相同的lambda,但cv.glmnet()生成的系数似乎与glmnet()生成的系数不同。为什么会这样?它们不应该是一样的吗 library(glmnet) # Data dimensions num.samples <- 30 num.genes <- 17000 # Data objects - note that both X and Y are scaled set.seed(123) Y <- matrix(rnorm(num.samples),

尽管我使用的是相同的lambda,但cv.glmnet()生成的系数似乎与glmnet()生成的系数不同。为什么会这样?它们不应该是一样的吗

library(glmnet)

# Data dimensions
num.samples <- 30
num.genes <- 17000

# Data objects - note that both X and Y are scaled
set.seed(123)
Y <- matrix(rnorm(num.samples), ncol=1)
set.seed(1234)
X <- matrix(rnorm(num.samples*num.genes), ncol=num.genes)

# Run cv.glmnet: get lambda.min and coef
fit.cv <- cv.glmnet(X, Y, nfolds=num.samples, intercept=FALSE)
fit.cv.lambda <- fit.cv$lambda.min
fit.cv.coef <- coef(fit.cv, s = fit.cv.lambda)[,1][2:(num.genes+1)]

# Run glmnet with lambda.min from cv.glmnet: get coef
second.lambda=fit.cv.lambda-0.0001 ## second.lambda included because glmnet manual recommends using >1 lambda for glmnet()
fit <- glmnet(X, Y, lambda=c(fit.cv.lambda,second.lambda), intercept=FALSE) 
fit.lambda <- fit$lambda[1]
fit.coef <- coef(fit, s = fit.cv.lambda)[,1][2:(num.genes+1)]

# Lambda is the same, but coefficients are not
fit.cv.lambda==fit.lambda ## TRUE
not.equal = which(fit.cv.coef != fit.coef)
length(not.equal) ## 18
mean(abs(fit.cv.coef[not.equal] - fit.coef[not.equal])) ## 0.0004038209
库(glmnet)
#数据维度

num.samples简短回答:这是一个数字精度问题。您遇到的差异不是由于
cv.glmnet
glmnet
之间的差异造成的。相反,它们是由以下因素组合而成:

  • 惩罚路径,
    lambda
    ,在两个对象之间是不同的,我指的是整个惩罚路径,而不仅仅是利息惩罚是否在这两个对象中
  • 默认收敛阈值,
    thresh=
如果希望从具有不同惩罚路径的两个glmnet或cv.glmnet对象获得的估计值相等(或至少非常接近),请在这两个函数中使用
thresh=
选项来降低收敛阈值。此外,我建议在
coef()
中设置
exact=TRUE

扩展答案:下面我们用几个例子来说明这一点。在执行此操作之前,了解
coef()
函数的逻辑也很重要(该函数使用
type=“coefficients”
调用
predict.glmnet()
函数)

  • 如果请求已在对象的原始惩罚路径中计算的惩罚的系数估计值,
    coef()
    将仅返回原始对象的估计值

  • 如果您请求对不在原始路径中的惩罚进行系数估计,并且
    exact=FALSE
    (这是默认值),则将根据原始路径中最近惩罚的估计值通过插值来估计系数

  • # Requested penalty in original path
    coef1 <- coef(fit, s = fit$lambda[10])
    coef2 <- coef(cvfit, s = fit$lambda[10])
    all.equal(coef1, coef2) #TRUE
    
    # Requested penalty not in original path -- uses interpolation
    coef1 <- coef(fit, s = 0.40)
    coef2 <- coef(cvfit, s = 0.40)
    all.equal(coef1, coef2) #TRUE
    
    # Force glmnet to refit the model with s added to the penalty path
    coef1 <- coef(fit, s = 0.40, exact = TRUE, x = X, y = Y)
    coef2 <- coef(cvfit, s = 0.40, exact = TRUE, x = X, y = Y)
    all.equal(coef1, coef2) #TRUE
    
  • 如果您请求对原始路径中不存在的惩罚进行系数估计,并且
    exact=TRUE
    ,则新的惩罚将添加到原始路径中,并重新安装整个模型以获得估计值

示例1:相同的惩罚路径,相同的估计值

如果使用默认参数,
glmnet()
cv.glmnet()
将为给定的数据集计算相同的惩罚路径(但由于停止条件,路径中计算的惩罚数量可能不同)。我们在下面展示了这一点:

library(glmnet)

# Data dimensions
num.samples <- 30
num.genes <- 17000

# Data objects - note that both X and Y are scaled
set.seed(123)
Y <- matrix(rnorm(num.samples), ncol=1)
X <- matrix(rnorm(num.samples*num.genes), ncol=num.genes)

# Run cv.glmnet and glmnet, obtain same penalty path up to min(num penalty)
fit <- glmnet(X, Y, intercept=FALSE)
cvfit <- cv.glmnet(X, Y, intercept= FALSE)
min_num_lambdas = min(length(fit$lambda), length(cvfit$lambda))
all.equal(fit$lambda[1:min_num_lambdas], cvfit$lambda[1:min_num_lambdas]) #TRUE
示例2:不同的惩罚路径,不同的估计值

我们对
glmnet()。这使得fit和cvfit之间的惩罚路径不同。现在,即使我们使用
exact=TRUE
选项,估计值也是不同的

fit <- glmnet(X, Y, intercept=FALSE, nlambda = 99)
coef1 <- coef(fit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
coef2 <- coef(cvfit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
all.equal(coef1, coef2) # Mean relative difference: 0.002006215

thresh=
需要减小的值将完全依赖于数据。

Wow!犯错误那是个好主意。在另一对随机种子上,我尝试了相同的参数。从R代码看,它看起来像是
cv.glmnet
“glmnet”对象,而
glmnet
“glmnet”对象看起来像是以相同的方式调用的……这可能是相关的:。具体来说,该模型是不确定的。
fit <- glmnet(X, Y, intercept=FALSE, nlambda = 99, thresh = 1e-20)
cvfit <- cv.glmnet(X, Y, intercept= FALSE, thresh = 1e-20)
coef1 <- coef(fit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
coef2 <- coef(cvfit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
all.equal(coef1, coef2) #TRUE
fit <- glmnet(X, Y, intercept=FALSE, nlambda = 99)
cvfit <- cv.glmnet(X, Y, intercept= FALSE)
coef1 <- coef(fit, s = 0.40, exact = TRUE, x = X, y = Y, thresh = 1e-20)
coef2 <- coef(cvfit, s = 0.40, exact = TRUE, x = X, y = Y, thresh = 1e-20)
all.equal(coef1, coef2) #TRUE