r中的oaxaca包出错-不一致参数_R_Categorical Data_Decomposition

r中的oaxaca包出错-不一致参数

r中的oaxaca包出错-不一致参数,r,categorical-data,decomposition,R,Categorical Data,Decomposition,我试图使用Oaxaca包运行Oaxaca分解，但包含某些变量似乎会触发错误“不一致参数”。据我所知，错误似乎只出现在包含某些因子/分类变量时，而不是所有因子/分类变量时以下是我的数据集wvs_Reduce的一个最小可复制示例： structure(list(emp = c(1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,

我试图使用Oaxaca包运行Oaxaca分解，但包含某些变量似乎会触发错误“不一致参数”。据我所知，错误似乎只出现在包含某些因子/分类变量时，而不是所有因子/分类变量时

以下是我的数据集wvs_Reduce的一个最小可复制示例：

structure(list(emp = c(1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 
1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0), education = structure(c(4L, 3L, 2L, 2L, 3L, 
3L, 2L, 6L, 4L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 4L, 4L, 1L, 2L, 4L, 
4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 2L, 4L, 4L, 4L, 3L, 
2L, 4L, 3L), .Label = c("No Formal Education", "Primary or Less", 
"Incomplete Secondary", "Secondary", "Incomplete University", 
"University or More"), class = "factor"), marital = structure(c(1L, 
1L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 3L, 4L, 3L, 1L, 1L, 
4L, 3L, 1L, 3L, 4L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 4L, 4L, 4L, 
3L, 3L, 4L, 3L, 3L, 4L, 3L), .Label = c("single", "cohabiting", 
"married", "previously married"), class = "factor"), Arab = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-40L), class = c("tbl_df", "tbl", "data.frame"))

当我运行命令时：

library(oaxaca)
oaxaca(emp ~ education + marital | Arab, 
       data = wvs_reduc, group.weights = 0, R = 10)

我得到错误消息：t（x.mean.A）%*%delta.A中的错误：不一致参数

在相关情况下，当我在较大的数据集上运行命令时，我会得到一个类似但不完全相同的错误，包括变量“婚姻”而不是“教育”或其他因素变量：

t（x.mean.A-x.mean.B）%*%beta.B中的错误：查看底层代码时出现不一致的参数

oaxaca:：.oaxaca.wrap

，错误部分如下：

E <- as.numeric(t(x.mean.A - x.mean.B) %*% beta.B)
C <- as.numeric(t(x.mean.B) %*% (beta.A - beta.B))
I <- as.numeric(t(x.mean.A - x.mean.B) %*% (beta.A - beta.B))

因此，所有的零都将被删除，我想说的是，您需要确保级别分布在您的分组类别中。我们可以通过模拟这个变量来确认这一点：

set.seed(111)
wvs_reduc$test_education =sample(levels(wvs_reduc$education),nrow(wvs_reduc),replace=TRUE)
wvs_reduc$test_marital =sample(levels(wvs_reduc$marital),nrow(wvs_reduc),replace=TRUE)

我们运行此命令并关闭引导：

oaxaca(emp ~ test_education + test_marital  | Arab, data=wvs_reduc,R=NULL)

如果我们设置引导，它会崩溃，因为在进行二次采样时，它可能会遇到相同的错误：

oaxaca(emp ~ test_education + test_marital  | Arab, data=wvs_reduc,R=2)
oaxaca: oaxaca() performing analysis. Please wait.

Bootstrapping standard errors:
1 / 2 (50%)
Error in t(x.mean.A) %*% delta.A : non-conformable arguments
In addition: There were 11 warnings (use warnings() to see them)

因此，为了使其在整个数据帧上工作，您需要检查是否存在n=1的级别（考虑到组）

Hmm好的，发生错误是因为您的一个因素在引导过程中仅以一个观察值结束。因此，错误发生在源代码中的一个错误部分，它假设一个矩阵，但是如果你有n=1，它是一个向量。这是底层的包装器oaxaca:：：.oaxaca.wrap，错误部分是这一行，你不可能绕过它。。你知道的问题是你需要引导吗？嗯。。。因此，我将其设置为不进行引导，对于我在这里提交的简化数据集，它并没有解决问题，但对于我更大的数据集，它允许我添加一个以前不工作的变量，但不能添加另一个。如果问题是n=1，你认为为我的分类变量折叠一些类别会有帮助吗？是的，在没有引导的情况下它会工作。设置R=1。例如，在您提供的示例中，Arab是一个整体，因此它不起作用。您可以随时对变量进行采样，以确保数据没有问题。

oaxaca(emp ~ test_education + test_marital  | Arab, data=wvs_reduc,R=2)
oaxaca: oaxaca() performing analysis. Please wait.

Bootstrapping standard errors:
1 / 2 (50%)
Error in t(x.mean.A) %*% delta.A : non-conformable arguments
In addition: There were 11 warnings (use warnings() to see them)