在R中自动获取复杂标题

在R中自动获取复杂标题,r,algorithm,R,Algorithm,我想请求一个脚本来检测和合并R中的标题行,当有多行标题时,如示例中所示。普遍的答案应该是: 1.标识标题2到更多的行数 2.填补页眉空白请参见示例中的NAs 3.将所有标题行合并为一行 我只能手动操作,请参见下文。这可能适用于具有任意行数的标题 text1<-"NA h_row1a NA NA NA h_row1b NA NA NA NA h_row2a NA h_row2b NA h

我想请求一个脚本来检测和合并R中的标题行,当有多行标题时,如示例中所示。普遍的答案应该是: 1.标识标题2到更多的行数 2.填补页眉空白请参见示例中的NAs 3.将所有标题行合并为一行

我只能手动操作,请参见下文。这可能适用于具有任意行数的标题

text1<-"NA      h_row1a NA      NA      NA      h_row1b NA      NA      NA
        NA      h_row2a NA      h_row2b NA      h_row2c NA      h_row2d NA
        NA      h_row3a h_row3b h_row3c h_row3d h_row3e h_row3f h_row3g h_row3h
element1        2       24%     25      40      23      44%     76      34
element2        3       26%     40      86      233     12%     55      12"
table1<-read.table(text=text1, skip=3,header=FALSE)
cat(text1, file = "ex.data")
header<-scan("ex.data", nlines = 1, what = character(), sep="", na.strings = "NA")
library(zoo)
header<-na.locf(header, na.rm=FALSE) # this fills the header gaps
header2 <- scan("ex.data", skip = 1, nlines = 1, what = character(), sep="", na.strings = "NA")
header2<-na.locf(header2, na.rm=FALSE)
header3 <- scan("ex.data", skip = 2, nlines = 1, what = character(), sep="", na.strings = "NA")
names(table1) <- paste0(header, header2, header3)
table1
#    NANANA h_row1ah_row2ah_row3a h_row1ah_row2ah_row3b h_row1ah_row2bh_row3c h_row1ah_row2bh_row3d h_row1bh_row2ch_row3e h_row1bh_row2ch_row3f, etc.
#1 element1                     2                   24%                    25                    40                    23                   44%, etc.
#2 element2                     3                   26%   , etc.

你可以这样做。它使用rle查看有多少行不能强制为数字,并假设这些是标题。我还将第一列设置为rownames——不确定您是否需要它。在完成此过程后,您可能还希望将剩余的值转换为数字-此时它们仍然是字符


到目前为止你试过什么?您在实施它时的具体问题是什么?给我们看看你的代码!
tab <- read.table(text=text1, header=FALSE,stringsAsFactors = FALSE)
#estimate no of header rows
headrows <- rle(apply(tab,1,function(x)(any(!is.na(as.numeric(x))))))$lengths[1]
#fill in blanks in headers
tab[1:headrows,] <- t(apply(tab[1:headrows,],1,na.locf,na.rm=FALSE))
names(tab) <- apply(tab[1:headrows,],2,paste0,collapse="_")
tab <- tab[-c(1:headrows),] #remove header rows (now set as column names)
rownames(tab) <- tab[,1]
tab <- tab[,-1] #remove first column (now set as rownames)

tab
         h_row1a_h_row2a_h_row3a h_row1a_h_row2a_h_row3b h_row1a_h_row2b_h_row3c h_row1a_h_row2b_h_row3d
element1                       2                     24%                      25                      40
element2                       3                     26%                      40                      86
         h_row1b_h_row2c_h_row3e h_row1b_h_row2c_h_row3f h_row1b_h_row2d_h_row3g h_row1b_h_row2d_h_row3h
element1                      23                     44%                      76                      34
element2                     233                     12%                      55                      12