在Python中读取多个XML:ParseError:格式不正确(无效令牌)
我必须阅读近千个XML文件,并在其中的X坐标和Y坐标上执行聚类。为此,我打开它们中的每一个,运行几个循环来获取坐标,并将它们附加到数据帧中。但是,我得到了下面的常见错误。在讨论了这个问题的其他解决方案之后,恐怕将编码更改为UTF-8并不能解决问题在Python中读取多个XML:ParseError:格式不正确(无效令牌),python,xml,pandas,xml-parsing,Python,Xml,Pandas,Xml Parsing,我必须阅读近千个XML文件,并在其中的X坐标和Y坐标上执行聚类。为此,我打开它们中的每一个,运行几个循环来获取坐标,并将它们附加到数据帧中。但是,我得到了下面的常见错误。在讨论了这个问题的其他解决方案之后,恐怕将编码更改为UTF-8并不能解决问题 ParseError: not well-formed (invalid token): line 1, column 1 其中一个XML文件如下所示。坐标标记位于交叉标记标记内 <?xml version="1.0" encoding="ut
ParseError: not well-formed (invalid token): line 1, column 1
其中一个XML文件如下所示。坐标标记位于交叉标记标记内
<?xml version="1.0" encoding="utf-8"?>
<KinoveaVideoAnalysis>
<FormatVersion>2.0</FormatVersion>
<Producer>Kinovea.0.8.27</Producer>
<OriginalFilename>Pagulayan vs Yapp (Last 16) 2019 US Open 9-ball 18</OriginalFilename>
<FullPath>D:\Online Learning Courses\Kinovea Analysis\Pagulayan vs Yapp (Last 16) 2019 US Open 9-ball 18.jpg</FullPath>
<ImageSize>1920;1080</ImageSize>
<AverageTimeStampsPerFrame>1</AverageTimeStampsPerFrame>
<CaptureFramerate>25</CaptureFramerate>
<UserFramerate>25</UserFramerate>
<FirstTimeStamp>0</FirstTimeStamp>
<SelectionStart>1</SelectionStart>
<Calibration>
<CalibrationLine>
<Origin>960;540</Origin>
<Scale>1</Scale>
</CalibrationLine>
<Unit Abbreviation="px">Pixels</Unit>
</Calibration>
<Keyframes>
<Keyframe id="d9c9e54a-66d8-4eed-82cc-d1bdb5c97eeb">
<Position UserTime="0:00:00:00">1</Position>
<Title>0:00:00:00</Title>
<Drawings>
<Plane id="a5ea55a7-3aca-4986-8fb3-386be2de5118" name="Perspective grid 1">
<PointUpperLeft>279.8182;166.9091</PointUpperLeft>
<PointUpperRight>1651.909;166.9091</PointUpperRight>
<PointLowerRight>1651.909;871.3636</PointLowerRight>
<PointLowerLeft>279.8182;871.3636</PointLowerLeft>
<DrawingStyle>
<Color Key="color">
<Value>255;100;149;237</Value>
</Color>
<GridDivisions Key="divisions">
<Value>8</Value>
</GridDivisions>
<Toggle Key="perspective">
<Value>false</Value>
<Variant>Perspective</Variant>
</Toggle>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>true</AlwaysVisible>
<UseDefault>false</UseDefault>
</InfosFading>
</Plane>
<CrossMark id="25dda1e6-f474-4c51-a1bb-d17da0d3588a" name="Marker 1">
<CenterPoint>1330.364;405</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>1281;283</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="370.36" UserXInvariant="370.36" UserY="135.00" UserYInvariant="135.00" UserUnitLength="px" />
</CrossMark>
<CrossMark id="9686f0e3-c2ae-4e61-b634-2bf49e29b16c" name="Marker 2">
<CenterPoint>1266.545;336.2727</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>1217;214</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="306.55" UserXInvariant="306.55" UserY="203.73" UserYInvariant="203.73" UserUnitLength="px" />
</CrossMark>
<CrossMark id="e22268ef-0348-4aba-bacd-6830227fe877" name="Marker 3">
<CenterPoint>1553.727;753.5455</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>1504;631</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="593.73" UserXInvariant="593.73" UserY="-213.55" UserYInvariant="-213.55" UserUnitLength="px" />
</CrossMark>
<CrossMark id="26bcf39d-75ea-4353-b26b-47bb86549a16" name="Marker 4">
<CenterPoint>1445.727;758.4545</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>1396;636</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="485.73" UserXInvariant="485.73" UserY="-218.45" UserYInvariant="-218.45" UserUnitLength="px" />
</CrossMark>
<CrossMark id="d45e5715-09d0-4f98-b533-71f30fb23fb6" name="Marker 5">
<CenterPoint>1620;844.3636</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>1571;722</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="660.00" UserXInvariant="660.00" UserY="-304.36" UserYInvariant="-304.36" UserUnitLength="px" />
</CrossMark>
<CrossMark id="96fbec04-de71-4a98-a55b-9f4f5d62b4bf" name="Marker 6">
<CenterPoint>682.3636;635.7273</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>633;513</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="-277.64" UserXInvariant="-277.64" UserY="-95.73" UserYInvariant="-95.73" UserUnitLength="px" />
</CrossMark>
<CrossMark id="59209987-854f-4d6a-95e4-8d4a35bc1b70" name="Marker 7">
<CenterPoint>684.8182;687.2727</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>635;565</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="-275.18" UserXInvariant="-275.18" UserY="-147.27" UserYInvariant="-147.27" UserUnitLength="px" />
</CrossMark>
<CrossMark id="47483b27-2d37-4448-bb39-3c43efa6d29b" name="Marker 8">
<CenterPoint>552.2727;392.7273</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>503;270</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="-407.73" UserXInvariant="-407.73" UserY="147.27" UserYInvariant="147.27" UserUnitLength="px" />
</CrossMark>
<CrossMark id="02448ccf-5298-4244-b4ab-05c41f2df372" name="Marker 9">
<CenterPoint>390.2727;481.0909</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>341;359</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="-569.73" UserXInvariant="-569.73" UserY="58.91" UserYInvariant="58.91" UserUnitLength="px" />
</CrossMark>
<CrossMark id="fb18c303-3e63-4c55-9000-80dc90bc65ba" name="Marker 10">
<CenterPoint>260.1818;142.3636</CenterPoint>
<ExtraData>None</ExtraData>
<MeasureLabel>
<SpacePosition>211;20</SpacePosition>
<TimePosition>0</TimePosition>
</MeasureLabel>
<DrawingStyle>
<Color Key="back color">
<Value>255;0;0;0</Value>
</Color>
</DrawingStyle>
<InfosFading>
<Enabled>true</Enabled>
<Frames>20</Frames>
<AlwaysVisible>false</AlwaysVisible>
<UseDefault>true</UseDefault>
</InfosFading>
<Coordinates UserX="-699.82" UserXInvariant="-699.82" UserY="397.64" UserYInvariant="397.64" UserUnitLength="px" />
</CrossMark>
</Drawings>
</Keyframe>
</Keyframes>
<CoordinateSystem id="48cc63a9-8e28-4c15-bef5-6404756a50b1" name="Coordinate System 1">
<Visible>false</Visible>
<DrawingStyle>
<Color Key="line color">
<Value>255;255;0;0</Value>
</Color>
</DrawingStyle>
</CoordinateSystem>
<Trackability />
</KinoveaVideoAnalysis>
此外,简化的代码可以处理单个文件。此代码
Xdemo = []
Ydemo = []
onetree = et.parse("XML ex-KVA\World Cup of Pool 2018 SF Austria vs China B 11.xml")
root = onetree.getroot()
for kfs in root.iter('Keyframes'):
root1 = et.Element('root')
root1 = kfs
for kf in root1.iter('Keyframe'):
root2 = et.Element('root')
root2 = kf
for cm in root2.iter('CrossMark'):
root3 = et.Element('root')
root3 = cm
for coord in root3.iter('Coordinates'):
Xdemo.append(coord.attrib['UserX'])
Ydemo.append(coord.attrib['UserY'])
dfDemo = pd.DataFrame({'X':Xdemo, 'Y':Ydemo})
print(dfDemo)
…给出了:
X Y
0 -243.27 93.27
1 -302.18 27.00
2 -535.36 71.18
3 -528.00 157.09
4 -429.82 196.36
5 -402.82 324.00
6 29.18 120.27
7 247.64 198.82
8 328.64 -100.64
9 -702.27 -328.91
我仍然是Python的新手,因此如果我的代码有任何其他问题,或者为了能够运行数据帧(一次10个坐标)并给出结果,可以提供更好的解决方案,我将非常感激。解决方案似乎是添加
encoding=“utf-8”
到带有open的语句,如带有open(filename'r',encoding=“utf-8”)的中的当前文件:
。只有这样解析器才能工作。显然,您应该更新代码以识别错误的XML文件,并具体查看它。我不明白您的意思。所有XML文件(将近一千个)的结构都是相同的,文件夹中没有非XML文件。所以我看不出文件的结构有什么问题,但你有一个问题。(我们不能相信您的评估,即当您同时告诉我们其中一个文件失败时,您的所有文件都是相同的。)为了我们和您的利益,请将问题的搜索空间减少到显示问题的最小数据和代码。创建并发布一个——注释最小且可复制(在我们这边)。
X Y
0 -243.27 93.27
1 -302.18 27.00
2 -535.36 71.18
3 -528.00 157.09
4 -429.82 196.36
5 -402.82 324.00
6 29.18 120.27
7 247.64 198.82
8 328.64 -100.64
9 -702.27 -328.91