R 对比度只能应用于至少具有两个级别的因子_R

R 对比度只能应用于至少具有两个级别的因子

R 对比度只能应用于至少具有两个级别的因子,r,R,我想用线性回归预测销售。这是我用于建模的数据表 > store Store Sales CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2 Promo2SinceWeek Promo2SinceYear Assortment_a 1: 3 8314 14130 12

我想用线性回归预测销售。这是我用于建模的数据表

> store
     Store Sales CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2 Promo2SinceWeek Promo2SinceYear Assortment_a
  1:     3  8314               14130                        12                     2006      1              14            2011            1
  2:     3  8977               14130                        12                     2006      1              14            2011            1
  3:     3  7610               14130                        12                     2006      1              14            2011            1
  4:     3  8864               14130                        12                     2006      1              14            2011            1
  5:     3  8107               14130                        12                     2006      1              14            2011            1
 ---                                                                                                                                       
775:     3 12247               14130                        12                     2006      1              14            2011            1
776:     3  4523               14130                        12                     2006      1              14            2011            1
777:     3  6069               14130                        12                     2006      1              14            2011            1
778:     3  5902               14130                        12                     2006      1              14            2011            1
779:     3  6823               14130                        12                     2006      1              14            2011            1
     Assortment_b Assortment_c StoreType_a StoreType_b StoreType_c StoreType_d DayOfWeek Open Promo SchoolHoliday DateYear DateMonth
  1:            0            0           1           0           0           0         5    1     1             1     2015         7
  2:            0            0           1           0           0           0         4    1     1             1     2015         7
  3:            0            0           1           0           0           0         3    1     1             1     2015         7
  4:            0            0           1           0           0           0         2    1     1             1     2015         7
  5:            0            0           1           0           0           0         1    1     1             1     2015         7
 ---                                                                                                                                
775:            0            0           1           0           0           0         1    1     1             0     2013         1
776:            0            0           1           0           0           0         6    1     0             0     2013         1
777:            0            0           1           0           0           0         5    1     0             1     2013         1
778:            0            0           1           0           0           0         4    1     0             1     2013         1
779:            0            0           1           0           0           0         3    1     0             1     2013         1
     DateDay DateWeek StateHoliday_0 StateHoliday_a StateHoliday_b StateHoliday_c CompetitionOpen PromoOpen IspromoinSales Prediction
  1:      31       30              1              0              0              0             103     52.00              1          0
  2:      30       30              1              0              0              0             103     52.00              1          0
  3:      29       30              1              0              0              0             103     52.00              1          0
  4:      28       30              1              0              0              0             103     52.00              1          0
  5:      27       30              1              0              0              0             103     52.00              1          0
 ---                                                                                                                                 
775:       7        1              1              0              0              0              73     20.75              1          0
776:       5        0              1              0              0              0              73     20.50              1          0
777:       4        0              1              0              0              0              73     20.50              1          0
778:       3        0              1              0              0              0              73     20.50              1          0
779:       2        0              1              0              0              0              73     20.50              1          0
>

因为我犯了一个错误

对比度只能应用于至少具有两个级别的因子

我应用@Scott所说的，因为我没有任何NA值

我需要知道哪些列应该在模型中转换为因子变量

  > lapply(store, function(x) ifelse(is.factor(x) | is.integer(x), levels(factor(x)), "numeric"))
$Store
[1] "3"

$Sales
[1] "numeric"

$CompetitionDistance
[1] "14130"

$CompetitionOpenSinceMonth
[1] "12"

$CompetitionOpenSinceYear
[1] "2006"

$Promo2
[1] "1"

$Promo2SinceWeek
[1] "14"

$Promo2SinceYear
[1] "2011"

$Assortment_a
[1] "1"

$Assortment_b
[1] "0"

$Assortment_c
[1] "0"

$StoreType_a
[1] "1"

$StoreType_b
[1] "0"

$StoreType_c
[1] "0"

$StoreType_d
[1] "0"

$DayOfWeek
[1] "1"

$Open
[1] "1"

$Promo
[1] "0"

$SchoolHoliday
[1] "0"

$DateYear
[1] "numeric"

$DateMonth
[1] "numeric"

$DateDay
[1] "numeric"

$DateWeek
[1] "numeric"

$StateHoliday_0
[1] "1"

$StateHoliday_a
[1] "0"

$StateHoliday_b
[1] "0"

$StateHoliday_c
[1] "0"

$CompetitionOpen
[1] "numeric"

$PromoOpen
[1] "numeric"

$IspromoinSales
[1] "numeric"

$Prediction
[1] "numeric"

然后我的模型如下所示。只需查看lm函数，我该如何编写它

M<-matrix(0,nrow=10,ncol = 1)
store <- data[Store == 3,]  # Pour sélectionner un magasin identifié par son numéro unique
shuffledIndices <- sample(nrow(store))  # Pour faire melanger les données et les réarranger
setDT(store)[,Prediction:=0]
z <- nrow(store)
for (i in 1:10) 
{    # 10-fold cross-validation
  sampleIndex <- floor(1+0.1*(i-1)*z):(0.1*i*z)  # 10 % de la totalité de la base est sélectionné
  test <- store[shuffledIndices[sampleIndex],]  # il est utilisé comme base de test
  train <- store[shuffledIndices[-sampleIndex],]  # il est utilisé comme base de train
  modell <- lm(Sales ~ as.factor(CompetitionDistance) + as.factor(CompetitionOpenSinceMonth) + as.factor(CompetitionOpenSinceYear) + 
                 as.factor(Promo2)+as.factor(Promo2SinceWeek)+as.factor(Promo2SinceYear)+as.factor(Assortment_a)+as.factor(Assortment_b)+as.factor(Assortment_c)+
                 as.factor(StoreType_a)+as.factor(StoreType_b)+as.factor(StoreType_c)+as.factor(StoreType_d)+as.factor(DayOfWeek)+as.factor(Open)+SchoolHoliday+
                 as.factor(Promo)+as.factor(StateHoliday_0)+as.factor(StateHoliday_a)+as.factor(StateHoliday_b)+as.factor(StateHoliday_c)+
                 as.factor(DateYear)+as.factor(DateMonth)+as.factor(DateDay)+as.factor(DateWeek)+as.factor(CompetitionOpen)+as.factor(PromoOpen)+as.factor(IspromoinSales),train)  # a linear model is fitted to the training set
  store[shuffledIndices[sampleIndex],Prediction:=predict(modell,test)] # predictions are generated for the test set based on the model
  M[i,1]<-(round(sqrt(mean((store$Prediction-test$Sales)^2))/mean(test$Sales),4))
}

plot(1:10,M[,1],type='b',xlab="i",ylab="rmse%")

M问题在于模型中有常量变量。这些变量不会添加信息，因此应排除在建模过程之外。

为什么？考虑到所有其他变量，您需要对销售进行建模。由于一些变量是常量，它们不提供任何关于销售如何变化的信息，因为这些变量不会变化
如果按以下方式修改模型，则代码应能正常工作：
modell <- lm(Sales ~ as.factor(DayOfWeek) + SchoolHoliday + as.factor(Promo) + 
               as.factor(DateYear) + as.factor(DateMonth) + as.factor(DateDay) + 
               as.factor(DateWeek) + as.factor(CompetitionOpen) + as.factor(PromoOpen), 
             data = train)

modell问题在于模型中有常量变量。这些变量不会添加信息，因此应排除在建模过程之外。

为什么？考虑到所有其他变量，您需要对销售进行建模。由于一些变量是常量，它们不提供任何关于销售如何变化的信息，因为这些变量不会变化
如果按以下方式修改模型，则代码应能正常工作：
modell <- lm(Sales ~ as.factor(DayOfWeek) + SchoolHoliday + as.factor(Promo) + 
               as.factor(DateYear) + as.factor(DateMonth) + as.factor(DateDay) + 
               as.factor(DateWeek) + as.factor(CompetitionOpen) + as.factor(PromoOpen), 
             data = train)

modell如果没有一个真正的可复制的例子，很难说，但我想当你进行交叉验证时，你会发现一些因素只有一个层次。您还应该在整个数据集中检查模型中用作因子的列是否有多个级别。@kath，谢谢您的评论，但只需查看编辑后的问题，您就可以看到建模所用的基础。它正好给出了这个错误，对比度错误在示例数据中，许多列只有一个级别（例如，竞争距离
）。如果您使用的列有多个级别，请使用lappy（存储、函数（x）ifelse（is.factor（x）| is.integer（x）、级别（factor（x））、“numeric”）
检查整个数据。@kah，正如您所建议的，您可以查看已编辑的代码，我尝试将那些只有一个级别的代码转换为因子，而那些数字代码我没有修改它们的类型。但我也犯了同样的错误！那我该怎么办呢？如果没有一个真正的可复制的例子，很难说，但我猜当你进行交叉验证时，你会发现一些因素只有一个水平。您还应该在整个数据集中检查模型中用作因子的列是否有多个级别。@kath，谢谢您的评论，但只需查看编辑后的问题，您就可以看到建模所用的基础。它正好给出了这个错误，对比度错误在示例数据中，许多列只有一个级别（例如，竞争距离
）。如果您使用的列有多个级别，请使用lappy（存储、函数（x）ifelse（is.factor（x）| is.integer（x）、级别（factor（x））、“numeric”）
检查整个数据。@kah，正如您所建议的，您可以查看已编辑的代码，我尝试将那些只有一个级别的代码转换为因子，而那些数字代码我没有修改它们的类型。但我也犯了同样的错误！那我该怎么办？