Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
sapply与lapply在读取文件和rbind时的对比';我在嘲笑他们_R_Sapply_Rbind_Read.csv - Fatal编程技术网

sapply与lapply在读取文件和rbind时的对比';我在嘲笑他们

sapply与lapply在读取文件和rbind时的对比';我在嘲笑他们,r,sapply,rbind,read.csv,R,Sapply,Rbind,Read.csv,我遵循哈德利的思路:读取多个CSV文件,然后将它们转换为一个数据帧。我还试验了lappy与sapply的对比,如上所述 这是我的第一个CSV文件: dput(File1) structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", "C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("B",

我遵循哈德利的思路:读取多个
CSV
文件,然后将它们转换为一个数据帧。我还试验了
lappy
sapply
的对比,如上所述

这是我的第一个CSV文件:

dput(File1)
structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", 
"C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 
2L, 2L), .Label = c("B", "D"), class = "factor"), Income = c(55L, 
23L, 34L, 45L, 44L), Tax = c(23L, 21L, 22L, 24L, 25L), Location = structure(c(3L, 
3L, 1L, 4L, 2L), .Label = c("Americas", "AP", "EMEA", "LATAM"
), class = "factor")), .Names = c("First.Name", "Last.Name", 
"Income", "Tax", "Location"), class = "data.frame", row.names = c(NA, 
-5L))
dput(File2)
structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", 
"C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 
2L, 2L), .Label = c("B", "D"), class = "factor"), Income = c(55L, 
55L, 55L, 55L, 55L), Tax = c(24L, 24L, 24L, 24L, 24L), Location = structure(c(3L, 
3L, 1L, 4L, 2L), .Label = c("Americas", "AP", "EMEA", "LATAM"
), class = "factor")), .Names = c("First.Name", "Last.Name", 
"Income", "Tax", "Location"), class = "data.frame", row.names = c(NA, 
-5L))
这是我的第二个CSV文件:

dput(File1)
structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", 
"C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 
2L, 2L), .Label = c("B", "D"), class = "factor"), Income = c(55L, 
23L, 34L, 45L, 44L), Tax = c(23L, 21L, 22L, 24L, 25L), Location = structure(c(3L, 
3L, 1L, 4L, 2L), .Label = c("Americas", "AP", "EMEA", "LATAM"
), class = "factor")), .Names = c("First.Name", "Last.Name", 
"Income", "Tax", "Location"), class = "data.frame", row.names = c(NA, 
-5L))
dput(File2)
structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", 
"C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 
2L, 2L), .Label = c("B", "D"), class = "factor"), Income = c(55L, 
55L, 55L, 55L, 55L), Tax = c(24L, 24L, 24L, 24L, 24L), Location = structure(c(3L, 
3L, 1L, 4L, 2L), .Label = c("Americas", "AP", "EMEA", "LATAM"
), class = "factor")), .Names = c("First.Name", "Last.Name", 
"Income", "Tax", "Location"), class = "data.frame", row.names = c(NA, 
-5L))
这是我的密码:

dat1 <-",First.Name,Last.Name,Income,Tax,Location\n1,A,B,55,23,EMEA\n2,C,D,23,21,EMEA\n3,A,D,34,22,Americas\n4,A,D,45,24,LATAM\n5,A,D,44,25,AP"
dat2 <-",First.Name,Last.Name,Income,Tax,Location\n1,A,B,55,24,EMEA\n2,C,D,55,24,EMEA\n3,A,D,55,24,Americas\n4,A,D,55,24,LATAM\n5,A,D,55,24,AP"

tc1 <- textConnection(dat1)
tc2 <- textConnection(dat2)

merged_file <- do.call(rbind, lapply(list(tc1,tc2), read.csv))
以下是输出:

    [,1] [,2] [,3] [,4] [,5]
 [1,]    1    2    1    1    1
 [2,]    1    2    2    2    2
 [3,]   55   23   34   45   44
 [4,]   23   21   22   24   25
 [5,]    3    3    1    4    2
 [6,]    1    2    1    1    1
 [7,]    1    2    2    2    2
 [8,]   55   55   55   55   55
 [9,]   24   24   24   24   24
[10,]    3    3    1    4    2

我非常感谢你的帮助。我对R相当陌生,不知道发生了什么

这个问题与因素无关,它是通用的
sapply
vs
lappy
。 为什么
sapply
的理解如此错误,而
lappy
的理解却正确记住,在R中,数据帧是列的列表。并且每列可以有不同的类型

  • lappy
    将列列表返回到
    rbind
    ,从而正确进行连接。它将相应的列保持在一起。因此,你的因素出现正确
  • sapply
    但是。。。
    • 返回数值矩阵。。。(与数据帧不同,矩阵只能有一种类型)
    • …更糟糕的是
    • 所以
      sapply
      将两个5x6输入数据帧转换为转置的6x5矩阵(列现在对应于行)
    • 所有数据都强制为数字(垃圾!)
    • 然后,
      rbind
      row-“将这两个数字的垃圾6x5矩阵串联成一个垃圾12x5矩阵。由于列已被转换为行,因此将矩阵连接在一起的行组合了数据类型,显然您的因子被弄乱了

小结:只需使用
lappy

为什么要将
lappy
更改为
sapply
lappy
是这里合适的函数,而且效率更高。顺便说一句,
paste
是矢量化的。@RichScriven-我只是在尝试理解为什么当我使用
sapply
而不是
lappy
时,输出是不同的。“虽然它工作得很好”,但作为一个可复制的例子,它甚至根本不工作。我们没有你的路径,所以它会失败。从
textConnection()
而不是文件中读取数据帧最简单。我编辑了你的代码。这个问题与因素无关,它是通用的sapply vs lapply。副本