将XML嵌套到R数据框架中
我正试图使这里找到的解决方案在存在重复嵌套的情况下工作(我不是html人-因此不确定这是否是正确的术语)将XML嵌套到R数据框架中,r,xml,R,Xml,我正试图使这里找到的解决方案在存在重复嵌套的情况下工作(我不是html人-因此不确定这是否是正确的术语) 上面引用的代码返回第一个周期,但我需要导入所有周期 数据如下所示: <Record> <LastName>REDACTED</LastName> <FirstName>REDACTED</FirstName> <DOB>REDACTED</DOB> <Rapsheet> <Header
上面引用的代码返回第一个周期,但我需要导入所有周期 数据如下所示:
<Record>
<LastName>REDACTED</LastName>
<FirstName>REDACTED</FirstName>
<DOB>REDACTED</DOB>
<Rapsheet>
<Header>
<DateOfBirth>REDACTED</DateOfBirth>
<SID>REDACTED</SID>
<Summary>
<DateOfLastArrest>
10/01/2012
</DateOfLastArrest>
<AgeOfOffender>21</AgeOfOffender>
<FailuresToAppear>0</FailuresToAppear>
<ViolationOfCourtOrdersOrConditions>
0
</ViolationOfCourtOrdersOrConditions>
<FelonyArrestsConvictions>
0/0
</FelonyArrestsConvictions>
<MisdemeanorArrestsConvictions>
0/0
</MisdemeanorArrestsConvictions>
<UnknownOffenseLevelArrestsConvictions>
1/0
</UnknownOffenseLevelArrestsConvictions>
<AssaultOnOfficerCharges>
0
</AssaultOnOfficerCharges>
<DeadlyWeaponRelatedCharges>
0
</DeadlyWeaponRelatedCharges>
<EscapeCharges>
0
</EscapeCharges>
<ViolationOfProbationParoleCharges>
0/0
</ViolationOfProbationParoleCharges>
</Summary>
</Header>
<Title>VERMONT CRIMINAL HISTORY</Title>
<Identification>
<VermontStateID>REDACTED</VermontStateID>
<DateOfBirth>REDACTED</DateOfBirth>
<PlaceOfBirthCity></PlaceOfBirthCity>
<PlaceOfBirthStateOrCountry></PlaceOfBirthStateOrCountry>
<Sex>F</Sex>
<Race>W</Race>
<Ethnicity>
</Ethnicity>
<USCitizen></USCitizen>
<Height>503</Height>
<Weight>180</Weight>
<EyeColor>GRN</EyeColor>
<HairColor>BLN</HairColor>
<ScarsMarksTattoos>
<SMTCode>TATTOO</SMTCode>
<SMTDescription>ARABIC TATOO ON ARM</SMTDescription>
</ScarsMarksTattoos>
<ScarsMarksTattoos>
<SMTCode>TATTOO</SMTCode>
<SMTDescription>NOSE RING LIP RINGS</SMTDescription>
</ScarsMarksTattoos>
<PrintsNCIC></PrintsNCIC>
<HenryUp></HenryUp>
<HenryLow></HenryLow>
<PhotoAvailable></PhotoAvailable>
<Address>
<Street>REDACTED</Street>
<City>WINOOSKI</City>
<State>VT</State>
<Zip>05404</Zip>
</Address>
</Identification>
<CriminalHistory>
<Cycle>
<CycleNumber>1</CycleNumber>
<TrackingNumber>1709462</TrackingNumber>
<Arrest>
<DateOfArrest>10/01/2012 </DateOfArrest>
<ArrestAgency>WINOOSKI PD VT0040400</ArrestAgency>
<ArrestAgencyCaseNumber>12WS04470</ArrestAgencyCaseNumber>
<Fingerprint>NO</Fingerprint>
<Charge>
<ChargeNumber>01</ChargeNumber>
<ChargeDescription></ChargeDescription>
<Statute></Statute>
<Severity></Severity>
</Charge>
</Arrest>
<Arraignment>
<ArraignmentDate>04/18/2014</ArraignmentDate>
<ArraignmentAgency>CHITTENDEN CO. DISTRICT COURT</ArraignmentAgency>
<DocketNumber>REDACTED</DocketNumber>
<Charge>
<ChargeNumber>01</ChargeNumber>
<ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
<Statute>13V1043A2</Statute>
<Severity>FELONY</Severity>
</Charge>
</Arraignment>
<CourtDisposition>
<ChargeNumber>01</ChargeNumber>
<Convicted>NO</Convicted>
<Felony>NO</Felony>
<ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
<Statute>13V1043A2</Statute>
<Disposition>
06/09/2014 CASE DISMISSED
</Disposition>
</CourtDisposition>
</Cycle>
<Cycle>
<CycleNumber>2</CycleNumber>
<TrackingNumber>1685833</TrackingNumber>
<Arrest>
<DateOfArrest>09/30/2012 </DateOfArrest>
<ArrestAgency>WINOOSKI PD VT0040400</ArrestAgency>
<ArrestAgencyCaseNumber>12WS004770</ArrestAgencyCaseNumber>
<Fingerprint>NO</Fingerprint>
</Arrest>
<Arraignment>
<ArraignmentDate>10/01/2012</ArraignmentDate>
<ArraignmentAgency>CHITTENDEN CO. DISTRICT COURT</ArraignmentAgency>
<DocketNumber>REDACTED</DocketNumber>
<Charge>
<ChargeNumber>01</ChargeNumber>
<ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
<Statute>13V1043A2</Statute>
<Severity>FELONY</Severity>
</Charge>
</Arraignment>
<CourtDisposition>
<ChargeNumber>01</ChargeNumber>
<Convicted>NO</Convicted>
<Felony>NO</Felony>
<ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
<Statute>13V1043A2</Statute>
<Disposition>
12/02/2013 CASE DISMISSED
</Disposition>
</CourtDisposition>
</Cycle>
</CriminalHistory>
</Rapsheet>
编辑
编辑
编辑
编辑
编辑
10/01/2012
21
0
0
0/0
0/0
1/0
0
0
0
0/0
佛蒙特州犯罪史
编辑
编辑
F
W
503
180
GRN
BLN
纹身
手臂上的阿拉伯塔图
纹身
鼻环唇环
编辑
维努斯基
及物动词
05404
1.
1709462
10/01/2012
威努斯基PD VT0040400
12WS04470
不
01
04/18/2014
CHITTENDEN公司区域法院
编辑
01
突击-AGG国内-1级,带武器
13V1043A2
重罪
01
不
不
突击-AGG国内-1级,带武器
13V1043A2
2014年9月6日案件驳回
2.
1685833
09/30/2012
威努斯基PD VT0040400
12WS004770
不
10/01/2012
CHITTENDEN公司区域法院
编辑
01
突击-AGG国内-1级,带武器
13V1043A2
重罪
01
不
不
突击-AGG国内-1级,带武器
13V1043A2
2013年2月12日案件驳回
提前感谢您考虑从
XML
包多次调用xmlToDataframe
,在该包中迭代循环长度节点。使用lappy
可以创建数据帧列表,使用plyr
包的rbind.fill()
可以在行绑定之前填充不存在的行,这是任何缺失节点(如第二个周期中的第一次充电)所需的
库(XML)
图书馆(plyr)
医生,你试过什么?请不要期望人们不经任何尝试就为您编写所有代码。(因为XML具有足够的灵活性,不只是面向列或行的数据,所以没有通用的任务解决方案。)在找到上面的链接之前,我已经尝试了poster尝试过的几乎所有方法。我不太了解解决方案正在做什么,无法修改它以满足我的需要。谢谢您的帮助。这在获取周期时很有用,但是我必须将周期与文档中的一些信息合并。识别块。人们有不同数量的周期,但只有一个标识头。在cycle块中没有任何东西可以将cycle df合并到标识块,绑定只对相等的行起作用,我没有。对不起-在记录块中-我需要将名称/dob附加到CycleSee更新。只需从当前标识元素周期扩展到两个节点级别。谢谢-这使我在正确的轨道上完成了我需要做的事情。
library(XML)
library(plyr)
doc <- xmlParse("path/To/XML.xml")
cyclelen <- length(xpathSApply(doc, "//Cycle"))
dfList <- lapply(seq(cyclelen), function(i) {
identification <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/../../Identification")))
names(identification) <- paste0("Identification.", names(identification))
cycle <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]")))
names(cycle) <- paste0("Cycle.", names(cycle))
arrest <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/Arrest")))
names(arrest) <- paste0("Arrest.", names(arrest))
arraign <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/Arraignment")))
names(arraign) <- paste0("Arraignment.", names(arraign))
court <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/CourtDisposition")))
names(court) <- paste0("CourtDisposition.", names(court))
cbind(identification, cycle, arrest, arraign, court)
})
df <- rbind.fill(dfList)
dfList2 <- lapply(seq(cyclelen), function(i) {
do.call(cbind,
lapply(c("Identification", "Cycle", "Arrest", "Arraignment", "CourtDisposition"), function(n){
if (n=="Identification") {
df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/../../", n)))
} else if (n=="Cycle") {
df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]")))
} else {
df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/", n)))
}
names(df) <- paste0(n, ".", names(df))
return(df)
})
)
})
df2 <- rbind.fill(dfList2)
all.equal(df, df2)
# TRUE