在R中使用大型csv文件_R_Bigdata_Large Files

在R中使用大型csv文件

在R中使用大型csv文件,r,bigdata,large-files,R,Bigdata,Large Files,任何帮助都将不胜感激我使用了下面的代码来分解我的大csv文件（4gb），现在我正在尝试保存第二、第三个。。。分为csv。但是，我只能访问数据的第一块我的代码有什么问题吗？如何将第二块数据保存到csv中 rgfile <- 'filename.csv' index <- 0 chunkSize <- 100000 con <- file(description = rgfile, open="r") dataChunk <- read.table(

任何帮助都将不胜感激

我使用了下面的代码来分解我的大csv文件（4gb），现在我正在尝试保存第二、第三个。。。分为csv。但是，我只能访问数据的第一块

我的代码有什么问题吗？如何将第二块数据保存到csv中

rgfile <- 'filename.csv' 

index <- 0  

chunkSize <- 100000

con <- file(description = rgfile, open="r")

dataChunk <- read.table(con, nrows= chunkSize, header=T, fill= TRUE, sep= ",")

actualColumnNames <- names(dataChunk)

repeat {

  index <- index + 1 

  print(paste('Processing rows:', index * chunkSize)) 

  if (nrow(dataChunk) != chunkSize){
    print('Processed all files!')
    break
  }

  dataChunk <- read.table(
    con, nrows = chunkSize, skip=0, header = FALSE, 
    fill=TRUE, sep = ",", col.names=actualColumnNames
  ) 

  break

}

rgfilelibrary（tidyverse）
图书馆（nycflights13）
#使problem可复制
每次通过循环时，您要覆盖的rgfiledataChunk
。是否要写入每个块？如果是这样，则在read.table
之后应该有一个write.csv
语句。此外，skip
需要设置为每次通过循环开始读取的第一行。使用nrows
参数设置要读取的行总数。考虑从<代码> Read Ro> <代码>包或<代码> FRADE<代码> > <代码>数据>表< /代码>包。两者都比read.table
或read.csv快得多。感谢您的回复，我非常感谢。你是对的，我正在尝试将每个块写入它自己的文件。如果您不介意的话，您可以分享一下如何合并write.csv代码，因为我多次尝试保存它，但一直收到一个错误。错误是什么？错误：意外输入：“dChunk=read.table（conn，nrows=chunk，skip=0，header=FALSE，fill=TRUE，sep=“，”，col.names=actualColumnNames）write.csv（dChunk，file='“>>break Error:no loop for break/next，跳到顶层>>}Error:unexpected'}'在“}”中（请不要在注释中包含大量代码或控制台输出，原因有两个：（1）它可能很难读取，特别是当有多行时；（2）希望解决自己问题的读者并不总能看到所有评论。请编辑您的问题并在其中插入错误。）
library(tidyverse)
library(nycflights13)

# make the problelm reproducible
rgfile <- 'flights.csv' 
write_csv(flights, rgfile)

# now, get to work

lines <- as.numeric(R.utils::countLines(rgfile))

chunk_size <- 100000

hdr <- read_csv(rgfile, n_max=2)

fnum <- 1

for (i in seq(1, lines, chunk_size)) {

  suppressMessages(
    read_csv(
      rgfile, col_names=colnames(hdr), skip=(i-1), n_max=chunk_size
    )
  ) -> x

  if (i>1) colnames(x) <- colnames(hdr)

  write_csv(x, sprintf("file%03d.csv", fnum))

  fnum <- fnum + 1

}