R 转置相同的对象_R_Matrix_Transpose

R 转置相同的对象

r matrix

R 转置相同的对象,r,matrix,transpose,R,Matrix,Transpose,我今天得到一个奇怪的结果复制它，考虑下面的数据帧： x <- data.frame(x=1:3, y=11:13) y <- x[1:3, 1:2] 将t（） identical(t(x),t(y)) # [1] FALSE 不同之处在于列名： colnames(t(x)) # NULL colnames(t(y)) # [1] "1" "2" "3" 有鉴于此，如果您想按列堆叠y，您将得到预期的结果： stack(as.data.frame(t(y))) # valu

我今天得到一个奇怪的结果

复制它，考虑下面的数据帧：

x <- data.frame(x=1:3, y=11:13)
y <- x[1:3, 1:2]

将

t（）
identical(t(x),t(y))
# [1] FALSE

不同之处在于列名：
colnames(t(x))
# NULL
colnames(t(y))
# [1] "1" "2" "3"

有鉴于此，如果您想按列堆叠y
，您将得到预期的结果：
stack(as.data.frame(t(y)))
#   values ind
# 1      1   1
# 2     11   1
# 3      2   2
# 4     12   2
# 5      3   3
# 6     13   3

而：
stack(as.data.frame(t(x)))
#     values ind
# 1      1  V1
# 2     11  V1
# 3      2  V2
# 4     12  V2
# 5      3  V3
# 6     13  V3

在后一种情况下，as.data.frame（）
找不到原始列名并自动生成它们
罪魁祸首位于as.matrix（）
，被称为t（）
：
解决方法是设置行名。强制：
rownames(as.matrix(x, rownames.force=TRUE))
# [1] "1" "2" "3"
rownames(as.matrix(y, rownames.force=TRUE))
# [1] "1" "2" "3"
identical(t(as.matrix(x, rownames.force=TRUE)), 
          t(as.matrix(y, rownames.force=TRUE)))
# [1] TRUE

（并相应地重写堆栈（…）
调用。）
我的问题是：
为什么as.matrix（）
会区别对待x
和y
和
你如何区分它们之间的区别
请注意，其他信息功能不会显示x和y之间的差异：
identical(attributes(x), attributes(y))
# [1] TRUE
identical(str(x), str(y))
# ...
#[1] TRUE

对解决办法的评论
对上述行为给出简洁而有效的解释（另请参见
有关更多详细信息）
简言之，康拉德表明：
a） x
和y
内部不同

b） “相同”
在默认情况下过于宽松”无法捕捉这种内部差异
现在，如果你取集S
的子集T
，其中包含S
的所有元素，那么S
和T
是完全相同的对象。因此，如果您使用一个数据帧y
，它包含x
的所有行和列，那么x
和y
应该是完全相同的对象。不幸的是x\neq y


这种行为不仅违反直觉，而且混淆不清，也就是说，区别不是不言而喻的，只是内部的，甚至默认的相同的函数都看不到它
另一个自然原理是，将两个相同的（类似矩阵的）对象转置会生成相同的对象。同样，在转置之前，idential
是“过于宽松”的事实打破了这一点；转置后，默认的相同的就足以看到差异
对于像R这样的科学语言来说，这种行为（即使不是bug）是一种不当行为
希望这个帖子会引起一些注意，R团队会考虑修改它。
 作为注释，<代码> x<代码>和<代码> y>代码>不完全相同。当我们调用
t
到data.frame
，将执行t.data.frame
：
function (x) 
{
    x <- as.matrix(x)
    NextMethod("t")
}

正如@oropendola所评论的，x
和y
的.row\u names\u info
的返回是不同的，上面的函数就是差异生效的地方
那么为什么y
的行名不同呢？让我们看看[.data.frame
，我在关键行添加了注释：
{
    ... # many lines of code
    xx <- x  #!! this is where xx is defined
    cols <- names(xx)
    x <- vector("list", length(x))
    x <- .Internal(copyDFattr(xx, x))  # This is where I am not sure about
    oldClass(x) <- attr(x, "row.names") <- NULL
    if (has.j) {
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (!is.character(j) && anyNA(nm)) 
            names(nm) <- names(x) <- seq_along(x)
        x <- x[j]
        cols <- names(x)
        if (drop && length(x) == 1L) {
            if (is.character(i)) {
                rows <- attr(xx, "row.names")
                i <- pmatch(i, rows, duplicates.ok = TRUE)
            }
            xj <- .subset2(.subset(xx, j), 1L)
            return(if (length(dim(xj)) != 2L) xj[i] else xj[i, 
                                                            , drop = FALSE])
        }
        if (anyNA(cols)) 
            stop("undefined columns selected")
        if (!is.null(names(nm))) 
            cols <- names(x) <- nm[cols]
        nxx <- structure(seq_along(xx), names = names(xx))
        sxx <- match(nxx[j], seq_along(xx))
    }
    else sxx <- seq_along(x)
    rows <- NULL ## this is where rows is defined, as we give numeric i, the following
    ## if block will not be executed
    if (is.character(i)) {
        rows <- attr(xx, "row.names")
        i <- pmatch(i, rows, duplicates.ok = TRUE)
    }
    for (j in seq_along(x)) {
        xj <- xx[[sxx[j]]]
        x[[j]] <- if (length(dim(xj)) != 2L) 
            xj[i]
        else xj[i, , drop = FALSE]
    }
    if (drop) {
        n <- length(x)
        if (n == 1L) 
            return(x[[1L]])
        if (n > 1L) {
            xj <- x[[1L]]
            nrow <- if (length(dim(xj)) == 2L) 
                dim(xj)[1L]
            else length(xj)
            drop <- !mdrop && nrow == 1L
        }
        else drop <- FALSE
    }
    if (!drop) { ## drop is False for our case
        if (is.null(rows)) 
            rows <- attr(xx, "row.names")  ## rows changed from NULL to 1,2,3 here
        rows <- rows[i]
        if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) {
            if (!dup && is.character(rows)) 
                dup <- "NA" %in% rows
            if (ina) 
                rows[is.na(rows)] <- "NA"
            if (dup) 
                rows <- make.unique(as.character(rows))
        }
        if (has.j && anyDuplicated(nm <- names(x))) 
            names(x) <- make.unique(nm)
        if (is.null(rows)) 
            rows <- attr(xx, "row.names")[i]
        attr(x, "row.names") <- rows  ## this is where the rownames of x changed
        oldClass(x) <- oldClass(xx)
    }
    x
}

因此，当我们使用[.data.frame
创建y
时，它会接收与x
不同的行.names
属性，其中行.names
是自动的，并在dput
结果中用负号表示

编辑
事实上，这已在以下手册中说明：
注
row.names类似于数组的rownames，它有一个
调用数组参数的行名
对于n>2，格式为1:n的行名称存储在一个压缩文件中
表单，可以从C代码中看到，也可以通过去粗化看到，但不能通过
row.names或attr（x，“row.names”）。此外，此
排序标记为“自动”，并由as.matrix进行不同的处理
和data.matrix（以及可能的其他函数）
因此，attr
不区分自动行名称（如x
）和显式整数行名称（如y
），而这是通过作为区分的。矩阵
通过内部表示法。行_名称_信息
相同
默认情况下过于宽松，但您可以更改：
> identical(x, y, attrib.as.set = FALSE)
[1] FALSE

通过更详细地检查对象，可以找到原因：
> dput(x)
structure(list(x = 1:3, y = 11:13), .Names = c("x", "y"), row.names = c(NA,
-3L), class = "data.frame")
> dput(y)
structure(list(x = 1:3, y = 11:13), .Names = c("x", "y"), row.names = c(NA,
3L), class = "data.frame")

请注意不同的行。名称属性：
> .row_names_info(x)
[1] -3
> .row_names_info(y)
[1] 3

从文档中我们可以看出，负数意味着自动行名（对于x
），而y
的行名不是自动的。as.matrix
对它们的处理方式不同。
似乎是行名的定义方式，因为它们在dput（x）
和dput中是不同的（y
）。可能它们是在使用[.data.frame
时显式添加的。您可以使用dput（x）和dput（y），您将看到row.names以不同的方式存储。我认为这与自动row.names处理有关（有关更多信息，请查看详细信息部分），不知道为什么子集返回不同的row.names，但是…老实说，它闻起来像是一个意外的行为，idential（x，y，attrib.as.set=FALSE）
似乎发现了差异（注意？idential
“注意idential（x，y，FALSE，FALSE，FALSE，FALSE）严格地测试完全相等。”as.matrix在调用.row\u names\u info
时与@digEmAll指出的不同之处在于x
中的自动行名。正如所写，as.matrix
删除了自动行名，这样它们就不会在矩阵中作为行名结束。值得注意的是attr（x，“row.names”）
和attr（x，“row.names”）=value
不显示R在内部如何处理“row.names”的特定情况。.row\u names\u info
更准确。例如attr（x，“row.names”）=1:3
不将1:3
存储为“row.names”，但如.row\u names\u info（x，0）所示
。但是，除了NULL之外的任何符号都会将对象标记为
> attr(x, 'row.names')
[1] 1 2 3

> identical(x, y, attrib.as.set = FALSE)
[1] FALSE

> dput(x)
structure(list(x = 1:3, y = 11:13), .Names = c("x", "y"), row.names = c(NA,
-3L), class = "data.frame")
> dput(y)
structure(list(x = 1:3, y = 11:13), .Names = c("x", "y"), row.names = c(NA,
3L), class = "data.frame")

> .row_names_info(x)
[1] -3
> .row_names_info(y)
[1] 3