Xml 将不均匀分层列表转换为数据帧
我认为还没有人问过这个问题,但是有没有一种方法可以将具有多个层次和不均匀结构的列表信息组合成一个“长”格式的数据框架 具体而言:Xml 将不均匀分层列表转换为数据帧,xml,list,r,dataframe,plyr,Xml,List,R,Dataframe,Plyr,我认为还没有人问过这个问题,但是有没有一种方法可以将具有多个层次和不均匀结构的列表信息组合成一个“长”格式的数据框架 具体而言: library(XML) library(plyr) xml.inning <- "http://gd2.mlb.com/components/game/mlb/year_2009/month_05/day_02/gid_2009_05_02_chamlb_texmlb_1/inning/inning_5.xml" xml.parse <- xmlInte
library(XML)
library(plyr)
xml.inning <- "http://gd2.mlb.com/components/game/mlb/year_2009/month_05/day_02/gid_2009_05_02_chamlb_texmlb_1/inning/inning_5.xml"
xml.parse <- xmlInternalTreeParse(xml.inning)
xml.list <- xmlToList(xml.parse)
## $top$atbat
## $top$atbat$pitch
## des id type x y
## "Ball" "310" "B" "70.39" "125.20"
我想要的是一个数据帧,来自音高类别中的命名向量,以及适当的(顶部,atbat,底部)。因此,我需要忽略那些由于列数不同而不适合data.frame的级别。大概是这样的:
first second third des x
1 top atbat pitch Ball 70.29
2 top atbat pitch Strike 69.24
3 bottom atbat pitch Out 67.22
有没有一种优雅的方法?谢谢 我不知道什么是优雅,但这很管用。那些更熟悉plyr的人可能会提供一个更通用的解决方案
cleanFun <- function(x) {
a <- x[["atbat"]]
b <- do.call(rbind,a[names(a)=="pitch"])
c <- as.data.frame(b)
}
ldply(xml.list[c("top","bottom")], cleanFun)[,1:5]
.id des id type x
1 top Ball 310 B 70.39
2 top Called Strike 311 S 118.45
3 top Called Strike 312 S 86.70
4 top In play, out(s) 313 X 79.83
5 bottom Ball 335 B 15.45
6 bottom Called Strike 336 S 77.25
7 bottom Swinging Strike 337 S 99.57
8 bottom Ball 338 B 106.44
9 bottom In play, out(s) 339 X 134.76
cleanFunldply()
的.id
功能很好,但如果您执行另一个ldply()
,它们似乎会重叠
下面是使用rbind.fill()
的相当通用的函数:
相关问题:
cleanFun <- function(x) {
a <- x[["atbat"]]
b <- do.call(rbind,a[names(a)=="pitch"])
c <- as.data.frame(b)
}
ldply(xml.list[c("top","bottom")], cleanFun)[,1:5]
.id des id type x
1 top Ball 310 B 70.39
2 top Called Strike 311 S 118.45
3 top Called Strike 312 S 86.70
4 top In play, out(s) 313 X 79.83
5 bottom Ball 335 B 15.45
6 bottom Called Strike 336 S 77.25
7 bottom Swinging Strike 337 S 99.57
8 bottom Ball 338 B 106.44
9 bottom In play, out(s) 339 X 134.76
aho <- ldply(llply(xml.list[[1]], function(x) ldply(x, function(x) rbind.fill(data.frame(t(x))))))
> aho[1:5,1:4]
.id des id type
1 pitch Ball 310 B
2 pitch Called Strike 311 S
3 pitch Called Strike 312 S
4 pitch In play, out(s) 313 X
5 .attrs Alexei Ramirez lines out to second baseman Ian Kinsler. <NA> <NA>
aho2 <- ldply(llply(xml.list[[1]], function(x) {
out <- ldply(x, function(x) rbind.fill(data.frame(t(x))))
names(out)[1] <- ".id2"
out
}))
> aho2[1:5,1:4]
.id .id2 des id
1 atbat pitch Ball 310
2 atbat pitch Called Strike 311
3 atbat pitch Called Strike 312
4 atbat pitch In play, out(s) 313
5 atbat .attrs Alexei Ramirez lines out to second baseman Ian Kinsler. <NA>