Xml R从data.frame获取列名

Xml R从data.frame获取列名,xml,r,Xml,R,我有以下代码: install.packages("XML") library(XML) install.packages("plyr") library(plyr) feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml" reuters<-xmlToList(feed) data <- lapply(reuters[[1]][names(reuters[[1]])=="item"], data.fram

我有以下代码:

install.packages("XML")
library(XML)
install.packages("plyr")
library(plyr)

feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
reuters<-xmlToList(feed)
data <- lapply(reuters[[1]][names(reuters[[1]])=="item"], data.frame)

data
install.packages(“XML”)
库(XML)
安装软件包(“plyr”)
图书馆(plyr)

feed您有一个data.frames列表。您可以将它们绑定在一起:

> names(do.call(rbind.data.frame, data))
[1] "title"           "link"            "description"     "category.text"  
[5] "category..attrs" "pubDate"         "guid.text"       "guid..attrs"    
[9] "origLink"

data1 <- do.call(rbind.data.frame, data)
> head(data1$title)
[1] Niger says will repatriate its illegal migrants from Algeria     
[2] Twin bombing near Kurdish party office in north Iraq kills 30    
[3] Suicide bomber kills four soldiers in Pakistan's tribal northwest
[4] Sisi keeps Egyptian premier to fix economy after turmoil         
[5] Kosovo's Thaci has tough job to form new cabinet, keep promises  
[6] Libyan Supreme Court rules PM's election unconstitutional        
25 Levels: Niger says will repatriate its illegal migrants from Algeria ...
>名称(do.call(rbind.data.frame,data))
[1] “标题”“链接”“说明”“类别。文本”
[5] “类别..属性”“发布日期”“guid.text”“guid..属性”
[9] “origLink”
数据1标题(数据1$标题)
[1] 尼日尔表示将遣返来自阿尔及利亚的非法移民
[2] 伊拉克北部库尔德政党办公室附近发生两起爆炸事件,造成30人死亡
[3] 巴基斯坦西北部部落发生自杀式炸弹袭击导致四名士兵死亡
[4] 西西挽留埃及总理解决动荡后的经济问题
[5] 科索沃的塔奇在组建新内阁、信守承诺方面面临着艰巨的任务
[6] 利比亚最高法院裁定总理选举违宪
25级:尼日尔表示将遣返来自阿尔及利亚的非法移民。。。
如果你只是想要标题

xData <- xmlParse(feed)
> head(xpathSApply(xData, "//title", xmlValue))
[1] "Reuters: World News"                                                 
[2] "Reuters: World News"                                                 
[3] "South Africa platinum strike talks in crucial final day of mediation"
[4] "Africa's sports bars, TV shacks step up security for World Cup"      
[5] "Niger says will repatriate its illegal migrants from Algeria"        
[6] "Twin bombing near Kurdish party office in north Iraq kills 30"     
扩展数据头(xpathSApply(扩展数据,//title,xmlValue)) [1] “路透社:世界新闻” [2] “路透社:世界新闻” [3] “南非铂金罢工谈判在关键的调解最后一天” [4] “非洲的体育酒吧、电视棚加强了世界杯的安保” [5] “尼日尔表示将遣返来自阿尔及利亚的非法移民” [6] “伊拉克北部库尔德政党办公室附近的两起爆炸事件造成30人死亡”
您也可以只检索名称而不绑定数据帧

Titles <- character(length(data))
for (i in seq_len(length(data))) Titles[i] <- as.character(data[[i]]$title)
Titles
[1] "Niger says will repatriate its illegal migrants from Algeria"                     "Twin bombing near Kurdish party office in north Iraq kills 30"                   
[3] "Suicide bomber kills four soldiers in Pakistan's tribal northwest"                "Sisi keeps Egyptian premier to fix economy after turmoil"                        
[5] "Kosovo's Thaci has tough job to form new cabinet, keep promises"                  "Libyan Supreme Court rules PM's election unconstitutional"                       
[7] "Thai junta to explain itself to international rights groups"                      "Well-trained and armed, Taliban tried to hijack plane in Pakistan"               
[9] "Russia would react to NATO build-up near borders: minister"                       "Myanmar military 'tortures civilians': human rights group"                       
[11] "Five jailed for killing Russia's Politkovskaya, mastermind unknown" 
...

标题以下是我通常的做法-快速简单

unname(sapply(data, '[[', 'title'))

# [1] South Africa platinum strike talks in crucial final day of mediation
# [2] Africa's sports bars, TV shacks step up security for World Cup      
# [3] Niger says will repatriate its illegal migrants from Algeria        
# [4] Twin bombing near Kurdish party office in north Iraq kills 30       
# [5] Suicide bomber kills four soldiers in Pakistan's tribal northwest   
# [6] Sisi keeps Egyptian premier to fix economy after turmoil            
# 25 Levels: South Africa platinum strike talks in crucial final day of mediation ...
您可以类似地访问任何其他元素,例如

unname(sapply(data, '[[', 'link'))

谢谢这很简单。有没有其他方法可以避免绑定数据帧,或者这是必要的?@user1477388您可以解析
XML
,并使用
xpath
获取标题,如果这就是您想要的,我只是对其他方法感兴趣-尝试学习R。我相信您的绑定方法可能是最好的。(+1)对于
xpathsaply
nice one,有许多方法可以从
XML
获取数据。主要取决于
XML
的复杂性。我最喜欢的答案之一是,哦,这很简单。非常感谢。
unname
是可选的-只需稍微整理一下输出:)