Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何告诉readr::read_csv正确猜测双栏_R_Tidyverse_Readr - Fatal编程技术网

如何告诉readr::read_csv正确猜测双栏

如何告诉readr::read_csv正确猜测双栏,r,tidyverse,readr,R,Tidyverse,Readr,我有很多零值的径流数据,偶尔还有一些非零值的双值 “readr::read_csv”猜测整数列类型,因为有许多零 如何使read_csv猜测正确的双栏类型? 我事先不知道变量名的映射,因此无法给出名称类型映射 这里有一个小例子 # create a column of doubles with many zeros (runoff data) #dsTmp <- data.frame(x = c(rep(0.0, 2), 0.5)) # this works dsTmp <

我有很多零值的径流数据,偶尔还有一些非零值的双值

“readr::read_csv”猜测整数列类型,因为有许多零

如何使read_csv猜测正确的双栏类型? 我事先不知道变量名的映射,因此无法给出名称类型映射

这里有一个小例子

  # create a column of doubles with many zeros (runoff data)
  #dsTmp <- data.frame(x = c(rep(0.0, 2), 0.5)) # this works
  dsTmp <- data.frame(x = c(rep(0.0, 1e5), 0.5))
  write_csv(dsTmp, "tmp/dsTmp.csv")
  # 0.0 is written as 0 
  # read_csv now guesses integer instead of double and reports 
  # a parsing failure. 
  ans <- read_csv("tmp/dsTmp.csv")
  # the last value is NA instead of 0.5
  tail(ans)
#创建一列包含多个零的双精度数据(径流数据)
#dsTmp这里有两种技术。(底部的数据准备。
$hp
$vs
及以上为整数列。)

注意:我将
cols(.default=col\u guess())
添加到大多数第一次调用中,这样我们就不会得到
read\u csv
发现列的内容的大消息。它可以省略,但代价是更嘈杂的控制台

  • 使用
    cols(.default=…)
    设置,强制所有列为双精度,只要知道文件中没有非数字,即可安全工作:

    read_csv("mtcars.csv", col_types = cols(.default = col_double()))
    # Warning in rbind(names(probs), probs_f) :
    #   number of columns of result is not a multiple of vector length (arg 1)
    # Warning: 32 parsing failures.
    ### ...snip...
    # See problems(...) for more details.
    # # A tibble: 32 x 11
    #      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    #    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #  1  21      NA  160    110  3.9   2.62  16.5     0     1     4     4
    #  2  21      NA  160    110  3.9   2.88  17.0     0     1     4     4
    #  3  22.8    NA  108     93  3.85  2.32  18.6     1     1     4     1
    #  4  21.4    NA  258    110  3.08  3.22  19.4     1     0     3     1
    #  5  18.7    NA  360    175  3.15  3.44  17.0     0     0     3     2
    #  6  18.1    NA  225    105  2.76  3.46  20.2     1     0     3     1
    #  7  14.3    NA  360    245  3.21  3.57  15.8     0     0     3     4
    #  8  24.4    NA  147.    62  3.69  3.19  20       1     0     4     2
    #  9  22.8    NA  141.    95  3.92  3.15  22.9     1     0     4     2
    # 10  19.2    NA  168.   123  3.92  3.44  18.3     1     0     4     4
    # # ... with 22 more rows
    
    最后一次阅读,注意到
    $hp
    及以后的版本现在是
    (与下面的数据准备阅读不同)

    read\u csv(“mtcars.csv”,col\u types=types)
    ##tibble:32 x 11
    #mpg气缸显示hp drat wt qsec与am齿轮carb
    #              
    #121C61601103.926216.5014
    #2 21 c6 160 110 3.9 2.88 17.0 1 4
    #3 22.8 c4 108 93 3.85 2.32 18.6 1 4 1
    #4 21.4 c6 258 110 3.08 3.22 19.4 1 0 3 1
    #5 18.7 c8 360 175 3.15 3.44 17.0 0 3 2
    #6 18.1 c6 225 105 2.76 3.46 20.2 1 0 3 1
    #7 14.3 c8 360 245 3.21 3.57 15.8 0 3 4
    #824.4 c4 147。62  3.69  3.19  20       1     0     4     2
    #9 22.8 c4 141。95  3.92  3.15  22.9     1     0     4     2
    #10 19.2 c6 168。123  3.92  3.44  18.3     1     0     4     4
    # # ... 还有22排
    

  • 数据:

    库(readr)
    
    mt
    data.table::fread
    似乎可以很好地解决这个问题

    write_csv(dsTmp, ttfile <- tempfile())
    ans <- fread(ttfile)
    tail(ans)
    #      x
    # 1: 0.0
    # 2: 0.0
    # 3: 0.0
    # 4: 0.0
    # 5: 0.0
    # 6: 0.5
    

    write_csv(dsTmp,ttfile我将r2evans解决方案的代码传输到一个小函数:

    read_csvDouble <- function(
      ### read_csv but read guessed integer columns as double
      ... ##<< further arguments to \code{\link{read_csv}}
      , n_max = Inf        ##<< see \code{\link{read_csv}}
      , col_types = cols(.default = col_guess()) ##<< see \code{\link{read_csv}}
      ## the default suppresses the type guessing messages
    ){
      ##details<< Sometimes, double columns are guessed as integer,  e.g. with
      ## runoff data where there are many zeros, an only occasionally 
      ## positive values that can be recognized as double.
      ## This functions modifies \code{read_csv} by changing guessed integer 
      ## columns to double columns.
      #https://stackoverflow.com/questions/52934467/how-to-tell-readrread-csv-to-guess-double-column-correctly
      colTypes <- read_csv(..., n_max = 3, col_types = col_types) %>% attr("spec")
      isIntCol <- map_lgl(colTypes$cols, identical, col_integer())
      colTypes$cols[isIntCol] <- replicate(sum(isIntCol), col_double())
      ##value<< tibble as returned by \code{\link{read_csv}}
      ans <- read_csv(..., n_max = n_max, col_types = colTypes)
      ans
    }
    

    read\u csvDouble您可以尝试增加
    guess\u max
    参数,以便它在猜测之前进一步查看文件以查找值。您可以尝试一下
    data.table::fread()
    吗?有什么原因吗
    read.csv()
    不是一个选项?@12b345b6b78 base R的
    read.csv
    是慢汉克斯,r2evans。您的解决方案2解决了我的问题。我将您的代码转换为一个小函数:
    data.table::fread
    确实工作得很好。但我不喜欢添加更多的包依赖项,而且
    read\u csv
    已经在项目的许多地方使用过。
    library(readr)
    mt <- mtcars
    mt$cyl <- paste0("c", mt$cyl) # for fun
    write_csv(mt, path = "mtcars.csv")
    read_csv("mtcars.csv", col_types = cols(.default = col_guess()))
    # # A tibble: 32 x 11
    #      mpg cyl    disp    hp  drat    wt  qsec    vs    am  gear  carb
    #    <dbl> <chr> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
    #  1  21   c6     160    110  3.9   2.62  16.5     0     1     4     4
    #  2  21   c6     160    110  3.9   2.88  17.0     0     1     4     4
    #  3  22.8 c4     108     93  3.85  2.32  18.6     1     1     4     1
    #  4  21.4 c6     258    110  3.08  3.22  19.4     1     0     3     1
    #  5  18.7 c8     360    175  3.15  3.44  17.0     0     0     3     2
    #  6  18.1 c6     225    105  2.76  3.46  20.2     1     0     3     1
    #  7  14.3 c8     360    245  3.21  3.57  15.8     0     0     3     4
    #  8  24.4 c4     147.    62  3.69  3.19  20       1     0     4     2
    #  9  22.8 c4     141.    95  3.92  3.15  22.9     1     0     4     2
    # 10  19.2 c6     168.   123  3.92  3.44  18.3     1     0     4     4
    # # ... with 22 more rows
    
    write_csv(dsTmp, ttfile <- tempfile())
    ans <- fread(ttfile)
    tail(ans)
    #      x
    # 1: 0.0
    # 2: 0.0
    # 3: 0.0
    # 4: 0.0
    # 5: 0.0
    # 6: 0.5
    
    read_csvDouble <- function(
      ### read_csv but read guessed integer columns as double
      ... ##<< further arguments to \code{\link{read_csv}}
      , n_max = Inf        ##<< see \code{\link{read_csv}}
      , col_types = cols(.default = col_guess()) ##<< see \code{\link{read_csv}}
      ## the default suppresses the type guessing messages
    ){
      ##details<< Sometimes, double columns are guessed as integer,  e.g. with
      ## runoff data where there are many zeros, an only occasionally 
      ## positive values that can be recognized as double.
      ## This functions modifies \code{read_csv} by changing guessed integer 
      ## columns to double columns.
      #https://stackoverflow.com/questions/52934467/how-to-tell-readrread-csv-to-guess-double-column-correctly
      colTypes <- read_csv(..., n_max = 3, col_types = col_types) %>% attr("spec")
      isIntCol <- map_lgl(colTypes$cols, identical, col_integer())
      colTypes$cols[isIntCol] <- replicate(sum(isIntCol), col_double())
      ##value<< tibble as returned by \code{\link{read_csv}}
      ans <- read_csv(..., n_max = n_max, col_types = colTypes)
      ans
    }