注释字符和导入带有read.table的DF的标题之间存在冲突
如何导入文件:注释字符和导入带有read.table的DF的标题之间存在冲突,r,import,read.table,R,Import,Read.table,如何导入文件: 以未定义数量的注释行开始 后面是一行标题,其中一些包含注释字符,用于标识上面的注释行 例如,对于这样的文件: # comment 1 # ... # comment X c01,c#02,c03,c04 1,2,3,4 5,6,7,8 然后: read.table(myfile,sep=“,”,header=T)中出错:更多列 而不是列名 明显的问题是,#被用作注释字符来宣布注释行,但也在标题中(无可否认,这是一种不好的做法,但我无法控制) 注释行的数量是未知的,我甚至不
- 以未定义数量的注释行开始
- 后面是一行标题,其中一些包含注释字符,用于标识上面的注释行
# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8
然后:
read.table(myfile,sep=“,”,header=T)中出错:更多列
而不是列名
明显的问题是,#
被用作注释字符来宣布注释行,但也在标题中(无可否认,这是一种不好的做法,但我无法控制)
注释行的数量是未知的,我甚至不能使用skip
参数。另外,在导入之前我不知道列名(甚至不知道它们的编号),所以我真的需要从文件中读取它们
除了手动操作文件之外,还有什么解决方案吗?计算以注释开头的行数,然后跳过它们可能很容易
csvfile <- "# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8"
# return a logical for whether the line starts with a comment.
# remove everything from the first FALSE and afterward
# take the sum of what's left
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# skip the lines that start with the comment character
Data <- read.csv(textConnection(csvfile),
skip = start_comment,
stringsAsFactors = FALSE)
readLines
将整个内容作为字符串导入,然后将其清理为标准格式。在将文件导入R之前,请先清理文件。也许您可以转到源代码并在那里处理。
csvfile <- "# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8"
# return a logical for whether the line starts with a comment.
# remove everything from the first FALSE and afterward
# take the sum of what's left
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# skip the lines that start with the comment character
Data <- read.csv(textConnection(csvfile),
skip = start_comment,
stringsAsFactors = FALSE)
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# Get the headers by themselves.
Head <- read.table(textConnection(csvfile),
skip = start_comment,
header = FALSE,
sep = ",",
comment.char = "",
nrows = 1)
Data <- read.table(textConnection(csvfile),
sep = ",",
header = FALSE,
skip = start_comment + 1,
stringsAsFactors = FALSE)
# apply column names to Data
names(Data) <- unlist(Head)