R 使用两个记录分隔符读取ASCII固定长度字段/记录
如果这太宽泛,我很抱歉,但我很难加载存档的人口普查县业务模式数据。文件格式描述为 ASCII,固定长度字段/记录,带有两个记录分隔符(回车符和换行符);记录长度包括分隔符 我试着将其加载到excel、R和Stata中。我只是想以可读的格式获取文件,以便以后使用。我试着用read.fwf将它读入R,但我真的不清楚应该使用什么宽度。我不太了解文件类型。我不太熟悉ASCII文件,文件后缀也没有给我太多信息。任何建议都将不胜感激。我在下面提供了一个链接,指向我试图使用的一组文件R 使用两个记录分隔符读取ASCII固定长度字段/记录,r,excel,ascii,stata,census,R,Excel,Ascii,Stata,Census,如果这太宽泛,我很抱歉,但我很难加载存档的人口普查县业务模式数据。文件格式描述为 ASCII,固定长度字段/记录,带有两个记录分隔符(回车符和换行符);记录长度包括分隔符 我试着将其加载到excel、R和Stata中。我只是想以可读的格式获取文件,以便以后使用。我试着用read.fwf将它读入R,但我真的不清楚应该使用什么宽度。我不太了解文件类型。我不太熟悉ASCII文件,文件后缀也没有给我太多信息。任何建议都将不胜感激。我在下面提供了一个链接,指向我试图使用的一组文件 有些行的数据不正确,因此
有些行的数据不正确,因此我建议从包
readr
library(readr)
data <- read_delim("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true",
delim = " ", col_names = FALSE)
data
## A tibble: 32,970 x 6
# X1 X2 X3 #X4 X5 X6
# <dbl> <chr> <chr> #<dbl> <dbl> <chr>
# 1 11001 ---- " 00000003482800000011566200000049404800248400140400048300029200018300006600003400001600000500000100000100000000… 11000 23001 " 424…
# 2 11001 07-- "B00000000000000000000000000000000000000001700001200000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 3 11001 0700 "B00000000000000000000000000000000000000001500001000000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 4 11001 0720 "B00000000000000000000000000000000000000000200000000000000000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 5 11001 0740 " 00000000001900000000003500000000019100000500000300000200000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 6 11001 0750 "A00000000000000000000000000000000000000000100000100000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 7 11001 0780 " 00000000001200000000002700000000035100000600000500000100000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 8 11001 0800 "A00000000000000000000000000000000000000000200000200000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 9 11001 10-- "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
#10 11001 1400 "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
## … with 32,960 more rows
库(readr)
数据有些行的数据不正确,因此我建议从包readr
library(readr)
data <- read_delim("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true",
delim = " ", col_names = FALSE)
data
## A tibble: 32,970 x 6
# X1 X2 X3 #X4 X5 X6
# <dbl> <chr> <chr> #<dbl> <dbl> <chr>
# 1 11001 ---- " 00000003482800000011566200000049404800248400140400048300029200018300006600003400001600000500000100000100000000… 11000 23001 " 424…
# 2 11001 07-- "B00000000000000000000000000000000000000001700001200000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 3 11001 0700 "B00000000000000000000000000000000000000001500001000000300000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 4 11001 0720 "B00000000000000000000000000000000000000000200000000000000000100000100000000000000000000000000000000000000000000… 11000 23001 " 424…
# 5 11001 0740 " 00000000001900000000003500000000019100000500000300000200000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 6 11001 0750 "A00000000000000000000000000000000000000000100000100000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 7 11001 0780 " 00000000001200000000002700000000035100000600000500000100000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 8 11001 0800 "A00000000000000000000000000000000000000000200000200000000000000000000000000000000000000000000000000000000000000… 11000 23001 " 424…
# 9 11001 10-- "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
#10 11001 1400 "A00000000000000000000000000000000000000000200000100000000000100000000000000000000000000000000000000000000000000… 11000 23001 " 424…
## … with 32,960 more rows
库(readr)
数据我们可以使用read.table
和fill=TRUE
在base R
data <- read.table("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true", fill = TRUE)
我们可以在base R
中使用read.table
和fill=TRUE
data <- read.table("https://catalog.archives.gov/OpaAPI/media/873805/content/arcmedia/electronic-records/rg-029/cbp-files/RG029.CBP85.T2I1?download=true", fill = TRUE)