将列表的控制台输出转换为真实的R列表_R_List_Parsing

将列表的控制台输出转换为真实的R列表

r list parsing

将列表的控制台输出转换为真实的R列表,r,list,parsing,R,List,Parsing,有人刚刚发布了一些控制台输出作为示例。（这种情况经常发生，我有转换向量和数据帧的打印输出的策略。）我想知道是否有人有一种优雅的方法将其解析为真正的R列表 test <- "[[1]] [1] 1.0000 1.9643 4.5957 [[2]] [1] 1.0000 2.2753 3.8589 [[3]] [1] 1.0000 2.9781 4.5651 [[4]] [1] 1.0000 2.9320 3.5519 [[5]] [1] 1.0000 3.5772 2.8560 [

有人刚刚发布了一些控制台输出作为示例。（这种情况经常发生，我有转换向量和数据帧的打印输出的策略。）我想知道是否有人有一种优雅的方法将其解析为真正的R列表

test <- "[[1]]
[1] 1.0000 1.9643 4.5957

[[2]]
[1] 1.0000 2.2753 3.8589

[[3]]
[1] 1.0000 2.9781 4.5651

[[4]]
[1] 1.0000 2.9320 3.5519

[[5]]
[1] 1.0000 3.5772 2.8560

[[6]]
[1] 1.0000 4.0150 3.1937

[[7]]
[1] 1.0000 3.3814 3.4291"

我不会称之为“优雅”，但对于未命名的列表，您可以按照以下方式进行一些检查/修改：

s <- strsplit(gsub("\\[+\\d+\\]+", "", test), "\n+")[[1]][-1]
lapply(s, function(x) scan(text = x, what = double(), quiet = TRUE))

[[1]]
[1] 1.0000 1.9643 4.5957

[[2]]
[1] 1.0000 2.2753 3.8589

[[3]]
[1] 1.0000 2.9781 4.5651

[[4]]
[1] 1.0000 2.9320 3.5519

[[5]]
[1] 1.0000 3.5772 2.8560

[[6]]
[1] 1.0000 4.0150 3.1937

[[7]]
[1] 1.0000 3.3814 3.4291

这是我的解决办法。它在您的测试用例和我测试过的其他几个测试用例上都运行良好

deprint <- function(ll) {
    ## Pattern to match strings beginning with _at least_ one $x or [[x]]
    branchPat <- "^(\\$[^$[]*|\\[\\[[[:digit:]]*\\]\\])"
    ## Pattern to match strings with _just_ one $x or one [[x]]
    trunkPat <- "^(\\$[^$[]*|\\[\\[[[:digit:]]*\\]\\])\\s*$"
    ##
    isBranch <- function(X) {
        grepl(branchPat, X[1])
    }
    ## Parse character vectors of lines like "[1] 1 3 4" or
    ## "[1] TRUE FALSE" or c("[1] a b c d", "[5] e f") 
    readTip <- function(X) {
        X <- paste(sub("^\\s*\\[.*\\]", "", X), collapse=" ")
        tokens <- scan(textConnection(X), what=character(), quiet=TRUE)
        read.table(text = tokens, stringsAsFactors=FALSE)[[1]]
    }

    ## (0) Split into vector of lines (if needed) and
    ##     strip out empty lines
    ll <- readLines(textConnection(ll))
    ll <- ll[ll!=""]

    ## (1) Split into branches ...
    trunks <- grep(trunkPat, ll)
    grp <- cumsum(seq_along(ll) %in% trunks)
    XX <- split(ll, grp)
    ## ... preserving element names, where present
    nms <- sapply(XX, function(X) gsub("\\[.*|\\$", "", X[[1]]))
    XX <-  lapply(XX, function(X) X[-1])
    names(XX) <- nms

    ## (2) Strip away top-level list identifiers.
    ## pat2 <- "^\\$[^$\\[]*"
    XX <- lapply(XX, function(X) sub(branchPat, "", X))

    ## (3) Step through list elements:
    ## - Branches will need further recursive processing.
    ## - Tips are ready to parse into base type vectors.
    lapply(XX, function(X) {
        if(isBranch(X)) deprint(X) else readTip(X)
    })
}

下面是它如何处理

test

的打印表示，这是您更常见的列表：

deprint(test)
## [[1]]
## [1] 1.0000 1.9643 4.5957
## 
## [[2]]
## [1] 1.0000 2.2753 3.8589
## 
## [[3]]
## [1] 1.0000 2.9781 4.5651
## 
## [[4]]
## [1] 1.0000 2.9320 3.5519
## 
## [[5]]
## [1] 1.0000 3.5772 2.8560
## 
## [[6]]
## [1] 1.0000 4.0150 3.1937
## 
## [[7]]
## [1] 1.0000 3.3814 3.4291

还有一个例子：

head(as.data.frame(deprint(capture.output(as.list(mtcars)))))
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

认真地说，请求

dput

输出。如果他们不提供，投反对票，继续前进。你可以使用像

lappy（readLines（文本连接（gsub）（\n（？=\n）\\[\\\[\\\\[\\\\[\\\\\[\\\\\\[\\\\\\\\\\\\[\\\\\\\\\[\\\\\\\\\\[\\\\\\\\\\\\\\\\[\\\\\\\\\\\[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\。另一种畸形是read.delim（text=gsub（“\\[+\\d+\\]+”，“”，test），header=FALSE，sep=“”）
，但仅适用于这种情况。@Andrie。甚至在这里都不起作用。提供3列数据框而不是7元素列表。@BondedDust True。关闭，但没有雪茄。您可以测试向量行是否存在“
来确定您运行哪种类型的扫描。对我也适用。谢谢。我昨天本来打算发放奖金的，我想我是通过勾选答案来实现的，但今天收到一条消息警告我奖金即将到期，所以我回来点击蓝色+500icon@BondedDust--谢谢。那是相当慷慨的赏金。我将不得不留意一些其他问题，以便分发它，从而传播节日欢乐！我不认为R特别适合递归，尽管它使用列表作为基本结构。当我咬了一口口齿不清的时候，我总是觉得很不舒服。我的另一个弱点是迭代定义数据的顺序处理：Y\u I
deprint <- function(ll) {
    ## Pattern to match strings beginning with _at least_ one $x or [[x]]
    branchPat <- "^(\\$[^$[]*|\\[\\[[[:digit:]]*\\]\\])"
    ## Pattern to match strings with _just_ one $x or one [[x]]
    trunkPat <- "^(\\$[^$[]*|\\[\\[[[:digit:]]*\\]\\])\\s*$"
    ##
    isBranch <- function(X) {
        grepl(branchPat, X[1])
    }
    ## Parse character vectors of lines like "[1] 1 3 4" or
    ## "[1] TRUE FALSE" or c("[1] a b c d", "[5] e f") 
    readTip <- function(X) {
        X <- paste(sub("^\\s*\\[.*\\]", "", X), collapse=" ")
        tokens <- scan(textConnection(X), what=character(), quiet=TRUE)
        read.table(text = tokens, stringsAsFactors=FALSE)[[1]]
    }

    ## (0) Split into vector of lines (if needed) and
    ##     strip out empty lines
    ll <- readLines(textConnection(ll))
    ll <- ll[ll!=""]

    ## (1) Split into branches ...
    trunks <- grep(trunkPat, ll)
    grp <- cumsum(seq_along(ll) %in% trunks)
    XX <- split(ll, grp)
    ## ... preserving element names, where present
    nms <- sapply(XX, function(X) gsub("\\[.*|\\$", "", X[[1]]))
    XX <-  lapply(XX, function(X) X[-1])
    names(XX) <- nms

    ## (2) Strip away top-level list identifiers.
    ## pat2 <- "^\\$[^$\\[]*"
    XX <- lapply(XX, function(X) sub(branchPat, "", X))

    ## (3) Step through list elements:
    ## - Branches will need further recursive processing.
    ## - Tips are ready to parse into base type vectors.
    lapply(XX, function(X) {
        if(isBranch(X)) deprint(X) else readTip(X)
    })
}

## Because deprint() interprets numbers without a decimal part as integers,
## I've modified L slightly, changing "list(w=2,4)" to "list(w=2L,4L)" 
## to allow a meaningful test using identical(). 
L <-
structure(list(a = structure(list(d = 1:2, j = 5:6, o = structure(list(
    w = 2L, 4L), .Names = c("w", ""))), .Names = c("d", "j", "o"
)), b = "c", c = 3:4), .Names = c("a", "b", "c"))

## Capture the print representation of L, and then feed it to deprint()
test2 <- capture.output(L)
LL <- deprint(test2)
identical(L, LL)
## [1] TRUE
LL
## $a
## $a$d
## [1] 1 2
## 
## $a$j
## [1] 5 6
## 
## $a$o
## $a$o$w
## [1] 2
## 
## $a$o[[2]]
## [1] 4
## 
## $b
## [1] "c"
## 
## $c
## [1] 3 4

deprint(test)
## [[1]]
## [1] 1.0000 1.9643 4.5957
## 
## [[2]]
## [1] 1.0000 2.2753 3.8589
## 
## [[3]]
## [1] 1.0000 2.9781 4.5651
## 
## [[4]]
## [1] 1.0000 2.9320 3.5519
## 
## [[5]]
## [1] 1.0000 3.5772 2.8560
## 
## [[6]]
## [1] 1.0000 4.0150 3.1937
## 
## [[7]]
## [1] 1.0000 3.3814 3.4291

head(as.data.frame(deprint(capture.output(as.list(mtcars)))))
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1