具有ColClass的read.table中出错
我将读取一个文本文件(使用read.table),其中包含三个字符中的一列,如“000000”,但我得到的是0。我尝试:具有ColClass的read.table中出错,r,read.table,R,Read.table,我将读取一个文本文件(使用read.table),其中包含三个字符中的一列,如“000000”,但我得到的是0。我尝试: X<-read.table(ouvrefic, header=TRUE, row.names=1, sep="",colClasses=c("integer","character","factor")) 我该怎么做 非常感谢 我的文本文件的开头: "" "dates" "Atscan2" "pqrPQR" "1" "18369" "0000000000000" "1
X<-read.table(ouvrefic, header=TRUE, row.names=1, sep="",colClasses=c("integer","character","factor"))
我该怎么做
非常感谢
我的文本文件的开头:
"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"
问题出在
colClasses
参数中:
首先,您有4列,即使您将第一列用作行.names
。因此,在该向量中需要四个元素
第二,如果需要正确显示所有的零,则需要将该列作为字符
以下工作:
df <- read.table(header=T, text='"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"',
row.names=1,
colClasses=c('character', 'character',"character","factor"))
如上所述,问题是如果列的元素被引用(如dates列),那么在colClasses
中使用integer
选项将不起作用(因此我也将其转换为字符)。您始终可以在之后将用作.integer
,并将其转换为整数
Akrun在注释中提供了一个直接的解决方案,它将首先删除从readLines
读取的双引号,然后在列上应用colClasses
:
df <- read.table(text=gsub('[\\"]', '', readLines('ouvrefic.txt')),
row.names=1,
colClasses=c('character', 'integer', 'character', 'factor'))
df当row.names=1
writeLines('"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', "x.txt")
df <- read.table("x.txt", header = TRUE,
row.names = 1, colClasses = c(NA, NA, "character", NA))
sapply(df, class)
# dates Atscan2 pqrPQR
# "integer" "character" "factor"
df
# dates Atscan2 pqrPQR
# 1 18369 0000000000000 1110
# 2 18369 0000000000000 1220,0
# 3 18369 0000000000000 2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000 1330,0
# 6 18369 0000000000000 2330,0
# 7 18369 0000000000000 3330
你能给我们看一下文件的一部分(前几行)吗?“日期”Atscan2“pqrPQR”1“18369”“0000000000000”“1110”“2”18369”“0000000000000”“1220,0”“3”“18369”“0000000000000”“2220”“4”“18369”“00000000000000”“1230,0,0”“5”“18369”“00000000000000000000”“1330,0”“6”“18369”“00000000000000000000”“2330,0”“7”“18369”“00000000000000”“3330”请将其编辑到您的问题中,以便我们可以适当地查看格式。@Lio如果dates
列不是如OP文章中建议的integer
,此链接可能会有所帮助?@akrun是的,这正是我想要做的,但是如果您将其指定为integer
,它似乎失败了,因为日期列的元素周围有双引号。不过,您可以在df
形成后执行此操作。一个选项可能是df@akrun将在akrun中工作。如果我是你,我会把它作为一个解决方案发布。你可以选择更新它。我的解决方案没有那么优雅,不能作为新的解决方案发布:-)
df <- read.table(text=gsub('[\\"]', '', readLines('ouvrefic.txt')),
row.names=1,
colClasses=c('character', 'integer', 'character', 'factor'))
writeLines('"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', "x.txt")
df <- read.table("x.txt", header = TRUE,
row.names = 1, colClasses = c(NA, NA, "character", NA))
sapply(df, class)
# dates Atscan2 pqrPQR
# "integer" "character" "factor"
df
# dates Atscan2 pqrPQR
# 1 18369 0000000000000 1110
# 2 18369 0000000000000 1220,0
# 3 18369 0000000000000 2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000 1330,0
# 6 18369 0000000000000 2330,0
# 7 18369 0000000000000 3330
read.table(
text = system("cat x.txt | tr -d \\\"", intern = TRUE),
colClasses = c(Atscan2 = "character")
)
# dates Atscan2 pqrPQR
# 1 18369 0000000000000 1110
# 2 18369 0000000000000 1220,0
# 3 18369 0000000000000 2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000 1330,0
# 6 18369 0000000000000 2330,0
# 7 18369 0000000000000 3330