如何在R中导入混合制表符和逗号描述的ASCII文件
我有一个ASCII文件,其中包含一组MODIS数据,其中包含每个采集日期的一系列像素值。数据格式为: ASCII值以逗号分隔 数据值从标题行之后开始,并以空格分隔。 数据中的两个日期示例如下所示:如何在R中导入混合制表符和逗号描述的ASCII文件,r,R,我有一个ASCII文件,其中包含一组MODIS数据,其中包含每个采集日期的一系列像素值。数据格式为: ASCII值以逗号分隔 数据值从标题行之后开始,并以空格分隔。 数据中的两个日期示例如下所示: ---------------------------------------------------------------------------- MODIS HDF Tile MOD13Q1.A2003273.h11v03.005.20082600326
----------------------------------------------------------------------------
MODIS HDF Tile MOD13Q1.A2003273.h11v03.005.2008260032604.hdf
Scientific Data Set (Band) 250m_16_days_EVI
Number of Values Passing QA Filter 81 of 81
Applying the Scale of .0001 MEAN: 0.24070987654321, STD-DEV: 0.0257345931611507
Unscaled MEAN: 2407.0987654321, STD-DEV: 257.345931611507
2213,2160,2206,2408,2369,2362,2423,2466,2318,2160,2429,2316,2260,2362,2431,2172,2021,2254,2424,2391,2427,2331,1934,2220,2235,2254,2186,2325,2046,1956,2273,2220,2235,2257,2425,2534,2141,2288,2273,2263,2436,2568,2603,2470,2561,2288,2369,2628,2725,2730,2603,2704,2744,2732,2624,2606,2694,2730,2718,2765,2771,2732,2771,2726,2694,2637,2699,2806,2712,2384,1904,1982,2747,2788,2610,2647,2408,2096,1946,1858,1791
----------------------------------------------------------------------------
MODIS HDF Tile MOD13Q1.A2003289.h11v03.005.2008263131227.hdf
Scientific Data Set (Band) 250m_16_days_EVI
Number of Values Passing QA Filter 81 of 81
Applying the Scale of .0001 MEAN: 0.261756790123457, STD-DEV: 0.0232843291670261
Unscaled MEAN: 2617.56790123457, STD-DEV: 232.843291670261
2074,2323,2382,2574,2614,2661,2631,2599,2525,2399,2548,2545,2541,2599,2415,2428,2417,2518,2549,2471,2539,2520,2407,2358,2426,2461,2575,2427,2412,2518,2500,2394,2509,2567,2569,2648,2414,2573,2498,2626,2509,2708,2694,2654,2702,2536,2750,2804,2917,2926,2942,2938,2844,2839,2863,2985,3006,2991,2997,2937,2830,2838,2607,3101,3093,3085,2950,2881,2608,2570,2499,2233,2912,2833,2819,2348,2426,2541,2243,2239,2071
一个典型的ASCII文件包含大约900个日期,即900个与上述格式完全相同的信息块,一个接一个。每个日期的像素数相同,即每个日期的像素数为81
我想做的是读取文件,并为每个日期提取MODIS HDF磁贴名称,例如MOD13Q1.A2003289.h11v03.005.2008263131227.HDF和每个像素值到各个列,类似于:
MODIS HDF Tile Scientific Data Set (Band) V2 V3 V4 V5 V6 V7...
MOD13Q1.A2003273.h11v03.005.2008263131227.hdf 250m_16_days_ENVI 2213 2160 2206 2408 2369 .......
MOD13Q1.A2003289.h11v03.005.2008263131227.hdf 250m_16_days_ENVI 2074 2323 2382 2574 2614 .....
任何帮助都将不胜感激 也许像这样的方法可以奏效
modis <- readLines("modis.txt")
headers <- grep("^MODIS", modis)
headtiles <- sapply(strsplit(modis[headers[1]],"\\s{2,}"), '[',1 )
headbands <- sapply(strsplit(modis[headers[1]+1],"\\s{2,}"), '[',1 )
tiles <- sapply(strsplit(modis[headers],"\\s{2,}"), '[',2 )
bands <- sapply(strsplit(modis[headers+1],"\\s{2,}"), '[',2 )
pxlines <- grep("(,.*?){5,}", modis)
pixels <- do.call(rbind, lapply(strsplit(modis[pxlines], ","), as.numeric))
dd<-data.frame(tiles, bands, pixels)
names(dd)<-c(headtiles , headbands , paste0("pixel", seq.int(ncol(pixels))))
在这里,我们通过grep遍历所有行来找到标题行,然后我们假设下一行是band行。然后我们寻找像素值有很多逗号的线。这是基于您提供的有限样本对数据文件进行的大量假设。您可能需要使用readLines创建一个自定义函数来获取整个文件,除非它非常庞大;grep定位瓷砖之间的边界;然后扫描/read.table/strsplit以选择所需的位。每个块是否包含8行文本和一个分隔行--?