Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用两个记录分隔符读取ASCII固定长度字段/记录_R_Excel_Ascii_Stata_Census - Fatal编程技术网

R 使用两个记录分隔符读取ASCII固定长度字段/记录

R 使用两个记录分隔符读取ASCII固定长度字段/记录,r,excel,ascii,stata,census,R,Excel,Ascii,Stata,Census,如果这太宽泛,我很抱歉,但我很难加载存档的人口普查县业务模式数据。文件格式描述为 ASCII,固定长度字段/记录,带有两个记录分隔符(回车符和换行符);记录长度包括分隔符 我试着将其加载到excel、R和Stata中。我只是想以可读的格式获取文件,以便以后使用。我试着用read.fwf将它读入R,但我真的不清楚应该使用什么宽度。我不太了解文件类型。我不太熟悉ASCII文件,文件后缀也没有给我太多信息。任何建议都将不胜感激。我在下面提供了一个链接,指向我试图使用的一组文件 有些行的数据不正确,因此

如果这太宽泛,我很抱歉,但我很难加载存档的人口普查县业务模式数据。文件格式描述为

ASCII,固定长度字段/记录,带有两个记录分隔符(回车符和换行符);记录长度包括分隔符

我试着将其加载到excel、R和Stata中。我只是想以可读的格式获取文件,以便以后使用。我试着用read.fwf将它读入R,但我真的不清楚应该使用什么宽度。我不太了解文件类型。我不太熟悉ASCII文件,文件后缀也没有给我太多信息。任何建议都将不胜感激。我在下面提供了一个链接,指向我试图使用的一组文件


有些行的数据不正确,因此我建议从包
readr

library(readr)
data <- read_delim("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true",
                   delim = " ", col_names = FALSE)
data
## A tibble: 32,970 x 6
#      X1 X2    X3                                                                                                                   #X4    X5 X6    
#   <dbl> <chr> <chr>                                                                                                             #<dbl> <dbl> <chr> 
# 1 11001 ----  " 00000003482800000011566200000049404800248400140400048300029200018300006600003400001600000500000100000100000000… 11000 23001 " 424…
# 2 11001 07--  "B00000000000000000000000000000000000000001700001200000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 3 11001 0700  "B00000000000000000000000000000000000000001500001000000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 4 11001 0720  "B00000000000000000000000000000000000000000200000000000000000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 5 11001 0740  " 00000000001900000000003500000000019100000500000300000200000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 6 11001 0750  "A00000000000000000000000000000000000000000100000100000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 7 11001 0780  " 00000000001200000000002700000000035100000600000500000100000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 8 11001 0800  "A00000000000000000000000000000000000000000200000200000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 9 11001 10--  "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
#10 11001 1400  "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
## … with 32,960 more rows
库(readr)

数据有些行的数据不正确,因此我建议从包
readr

library(readr)
data <- read_delim("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true",
                   delim = " ", col_names = FALSE)
data
## A tibble: 32,970 x 6
#      X1 X2    X3                                                                                                                   #X4    X5 X6    
#   <dbl> <chr> <chr>                                                                                                             #<dbl> <dbl> <chr> 
# 1 11001 ----  " 00000003482800000011566200000049404800248400140400048300029200018300006600003400001600000500000100000100000000… 11000 23001 " 424…
# 2 11001 07--  "B00000000000000000000000000000000000000001700001200000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 3 11001 0700  "B00000000000000000000000000000000000000001500001000000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 4 11001 0720  "B00000000000000000000000000000000000000000200000000000000000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 5 11001 0740  " 00000000001900000000003500000000019100000500000300000200000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 6 11001 0750  "A00000000000000000000000000000000000000000100000100000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 7 11001 0780  " 00000000001200000000002700000000035100000600000500000100000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 8 11001 0800  "A00000000000000000000000000000000000000000200000200000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 9 11001 10--  "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
#10 11001 1400  "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
## … with 32,960 more rows
库(readr)

数据我们可以使用
read.table
fill=TRUE
base R

data <- read.table("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true", fill = TRUE)

我们可以在
base R
中使用
read.table
fill=TRUE

data <- read.table("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true", fill = TRUE)