具有ColClass的read.table中出错

具有ColClass的read.table中出错,r,read.table,R,Read.table,我将读取一个文本文件(使用read.table),其中包含三个字符中的一列,如“000000”,但我得到的是0。我尝试: X<-read.table(ouvrefic, header=TRUE, row.names=1, sep="",colClasses=c("integer","character","factor")) 我该怎么做 非常感谢 我的文本文件的开头: "" "dates" "Atscan2" "pqrPQR" "1" "18369" "0000000000000" "1

我将读取一个文本文件(使用read.table),其中包含三个字符中的一列,如“000000”,但我得到的是0。我尝试:

X<-read.table(ouvrefic, header=TRUE, row.names=1, sep="",colClasses=c("integer","character","factor"))
我该怎么做

非常感谢

我的文本文件的开头:

"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"

问题出在
colClasses
参数中:

首先,您有4列,即使您将第一列用作
行.names
。因此,在该向量中需要四个元素

第二,如果需要正确显示所有的零,则需要将该列作为字符

以下工作:

df <- read.table(header=T, text='"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', 
row.names=1, 
colClasses=c('character', 'character',"character","factor"))
如上所述,问题是如果列的元素被引用(如dates列),那么在
colClasses
中使用
integer
选项将不起作用(因此我也将其转换为字符)。您始终可以在之后将
用作.integer
,并将其转换为整数

Akrun在注释中提供了一个直接的解决方案,它将首先删除从
readLines
读取的双引号,然后在列上应用
colClasses

 df <- read.table(text=gsub('[\\"]', '', readLines('ouvrefic.txt')),
                  row.names=1, 
                  colClasses=c('character', 'integer', 'character', 'factor'))

df当
row.names=1

writeLines('"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', "x.txt")

df <- read.table("x.txt", header = TRUE, 
     row.names = 1, colClasses = c(NA, NA, "character", NA))

sapply(df, class)
#      dates     Atscan2      pqrPQR 
#  "integer" "character"    "factor" 
df
#   dates       Atscan2   pqrPQR
# 1 18369 0000000000000     1110
# 2 18369 0000000000000   1220,0
# 3 18369 0000000000000     2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000   1330,0
# 6 18369 0000000000000   2330,0
# 7 18369 0000000000000     3330

你能给我们看一下文件的一部分(前几行)吗?“日期”Atscan2“pqrPQR”1“18369”“0000000000000”“1110”“2”18369”“0000000000000”“1220,0”“3”“18369”“0000000000000”“2220”“4”“18369”“00000000000000”“1230,0,0”“5”“18369”“00000000000000000000”“1330,0”“6”“18369”“00000000000000000000”“2330,0”“7”“18369”“00000000000000”“3330”请将其编辑到您的问题中,以便我们可以适当地查看格式。@Lio如果
dates
列不是如OP文章中建议的
integer
,此链接可能会有所帮助?@akrun是的,这正是我想要做的,但是如果您将其指定为
integer
,它似乎失败了,因为日期列的元素周围有双引号。不过,您可以在
df
形成后执行此操作。一个选项可能是
df@akrun将在akrun中工作。如果我是你,我会把它作为一个解决方案发布。你可以选择更新它。我的解决方案没有那么优雅,不能作为新的解决方案发布:-)
 df <- read.table(text=gsub('[\\"]', '', readLines('ouvrefic.txt')),
                  row.names=1, 
                  colClasses=c('character', 'integer', 'character', 'factor'))
writeLines('"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', "x.txt")

df <- read.table("x.txt", header = TRUE, 
     row.names = 1, colClasses = c(NA, NA, "character", NA))

sapply(df, class)
#      dates     Atscan2      pqrPQR 
#  "integer" "character"    "factor" 
df
#   dates       Atscan2   pqrPQR
# 1 18369 0000000000000     1110
# 2 18369 0000000000000   1220,0
# 3 18369 0000000000000     2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000   1330,0
# 6 18369 0000000000000   2330,0
# 7 18369 0000000000000     3330
read.table(
    text = system("cat x.txt | tr -d \\\"", intern = TRUE), 
    colClasses = c(Atscan2 = "character")
)
#   dates       Atscan2   pqrPQR
# 1 18369 0000000000000     1110
# 2 18369 0000000000000   1220,0
# 3 18369 0000000000000     2220
# 4 18369 0000000000000 1230,0,0
# 5 18369 0000000000000   1330,0
# 6 18369 0000000000000   2330,0
# 7 18369 0000000000000     3330