在R中以块形式读取.csv时，如何基于（多个）值进行选择_R_For Loop_Bigdata_Gis_Chunking

在R中以块形式读取.csv时，如何基于（多个）值进行选择

r for-loop gis

在R中以块形式读取.csv时，如何基于（多个）值进行选择,r,for-loop,bigdata,gis,chunking,R,For Loop,Bigdata,Gis,Chunking,我有一个约26 GB的25年气候数据文件，它太大了，无法在R中处理。变量包括： 1992年，第7个月，米制塔夫圆锥纬度X24.5625，经度X81.8125，测量X29.49 head(df.laf) X1992 X7 tave conus X24.5625 X.81.8125 X29.49 1 1992 7 tave conus 24.5625 -81.7708 29.46 50行样本数据： dputdroplevelsheaddata，50岁

我有一个约26 GB的25年气候数据文件，它太大了，无法在R中处理。变量包括： 1992年，第7个月，米制塔夫圆锥纬度X24.5625，经度X81.8125，测量X29.49

head(df.laf)
     X1992 X7  tave conus  X24.5625  X.81.8125 X29.49
1    1992  7   tave conus  24.5625   -81.7708  29.46

50行样本数据：

dputdroplevelsheaddata，50岁

这个数据集代表整个美国的气候数据。我只对分析一个特定地区感兴趣。为了将文件缩小到可管理的大小，我希望将其分块读入R，同时根据经度/纬度值进行过滤

下面是我使用LaF和基于latitude列名X24.5625值42.3542的2过滤器读取1^e6块的代码，使用for循环

当我运行这段代码时，生成的df res将显示一个表，该表中没有可用的数据。最终，我的目标是编写一个for循环，成功返回一个具有指定经度/纬度值的表，并以1e^6行的块在数据集中的每一行上进行迭代

我的问题是:

1为什么我得到的是一个空表，而不是一个具有与指定纬度对应的值的表？ 2如何确保循环在整个数据集上运行

我是一个新的R用户，如果可能的话，我会从带注释的代码中受益

#read dataframe in chunks
library('LaF')
quatcent <- '1972_2017.csv'

#create column names
quatcent_colnames <- c("year", "month", "metric", "conus", "latitude", 
"longitude", "measurement")

#detect a model for file:
model <- detect_dm_csv(quatcent, sep=",", header=TRUE)

#create connection to file using model:
df.laf <- laf_open(model)

# go to a specified place in the file (in this case, row 1)
goto(df.laf, 1)
data <- next_block(df.laf,nrows=1e6)
names(data) <-
c("year","month","metric","conus","latitude","longitude","measurement")


# create a for loop to subset by long/lat value
library('dplyr')
library('stringr')

res <- df.laf[1,][0,]
for(i in 1:10){
  raw <-
    next_block(df.laf,nrows=100e6) %>% 
    filter(str_detect("X24.5625","42.3542"))
  res <- rbind(res, raw)

}

使用dput在这里发布示例数据可能是个好主意。50行就行了。你可以试着读入10行的代码块，看看你的代码是否有效，再加上当你试图在for循环中增加res时，你的代码会非常慢。这里的更多信息谢谢-示例数据addedI建议使用sed或awk或编写一个小型c程序来读取文件，并将所需的行和列提取到一个新文件中，并在该文件上使用R。一个优点是，您不必每次都要在子集上执行某些处理时一直读取大文件。使用c程序还允许您使用int或long或float或double编写输出文件，这将进一步减小输出文件的大小。

#read dataframe in chunks
library('LaF')
quatcent <- '1972_2017.csv'

#create column names
quatcent_colnames <- c("year", "month", "metric", "conus", "latitude", 
"longitude", "measurement")

#detect a model for file:
model <- detect_dm_csv(quatcent, sep=",", header=TRUE)

#create connection to file using model:
df.laf <- laf_open(model)

# go to a specified place in the file (in this case, row 1)
goto(df.laf, 1)
data <- next_block(df.laf,nrows=1e6)
names(data) <-
c("year","month","metric","conus","latitude","longitude","measurement")


# create a for loop to subset by long/lat value
library('dplyr')
library('stringr')

res <- df.laf[1,][0,]
for(i in 1:10){
  raw <-
    next_block(df.laf,nrows=100e6) %>% 
    filter(str_detect("X24.5625","42.3542"))
  res <- rbind(res, raw)

}