在Python中读取多个XML:ParseError:格式不正确(无效令牌)

在Python中读取多个XML:ParseError:格式不正确(无效令牌),python,xml,pandas,xml-parsing,Python,Xml,Pandas,Xml Parsing,我必须阅读近千个XML文件,并在其中的X坐标和Y坐标上执行聚类。为此,我打开它们中的每一个,运行几个循环来获取坐标,并将它们附加到数据帧中。但是,我得到了下面的常见错误。在讨论了这个问题的其他解决方案之后,恐怕将编码更改为UTF-8并不能解决问题 ParseError: not well-formed (invalid token): line 1, column 1 其中一个XML文件如下所示。坐标标记位于交叉标记标记内 <?xml version="1.0" encoding="ut

我必须阅读近千个XML文件,并在其中的X坐标和Y坐标上执行聚类。为此,我打开它们中的每一个,运行几个循环来获取坐标,并将它们附加到数据帧中。但是,我得到了下面的常见错误。在讨论了这个问题的其他解决方案之后,恐怕将编码更改为UTF-8并不能解决问题

ParseError: not well-formed (invalid token): line 1, column 1
其中一个XML文件如下所示。坐标标记位于交叉标记标记内

<?xml version="1.0" encoding="utf-8"?>
<KinoveaVideoAnalysis>
  <FormatVersion>2.0</FormatVersion>
  <Producer>Kinovea.0.8.27</Producer>
  <OriginalFilename>Pagulayan vs Yapp (Last 16) 2019 US Open 9-ball 18</OriginalFilename>
  <FullPath>D:\Online Learning Courses\Kinovea Analysis\Pagulayan vs Yapp (Last 16) 2019 US Open 9-ball 18.jpg</FullPath>
  <ImageSize>1920;1080</ImageSize>
  <AverageTimeStampsPerFrame>1</AverageTimeStampsPerFrame>
  <CaptureFramerate>25</CaptureFramerate>
  <UserFramerate>25</UserFramerate>
  <FirstTimeStamp>0</FirstTimeStamp>
  <SelectionStart>1</SelectionStart>
  <Calibration>
    <CalibrationLine>
      <Origin>960;540</Origin>
      <Scale>1</Scale>
    </CalibrationLine>
    <Unit Abbreviation="px">Pixels</Unit>
  </Calibration>
  <Keyframes>
    <Keyframe id="d9c9e54a-66d8-4eed-82cc-d1bdb5c97eeb">
      <Position UserTime="0:00:00:00">1</Position>
      <Title>0:00:00:00</Title>
      <Drawings>
        <Plane id="a5ea55a7-3aca-4986-8fb3-386be2de5118" name="Perspective grid 1">
          <PointUpperLeft>279.8182;166.9091</PointUpperLeft>
          <PointUpperRight>1651.909;166.9091</PointUpperRight>
          <PointLowerRight>1651.909;871.3636</PointLowerRight>
          <PointLowerLeft>279.8182;871.3636</PointLowerLeft>
          <DrawingStyle>
            <Color Key="color">
              <Value>255;100;149;237</Value>
            </Color>
            <GridDivisions Key="divisions">
              <Value>8</Value>
            </GridDivisions>
            <Toggle Key="perspective">
              <Value>false</Value>
              <Variant>Perspective</Variant>
            </Toggle>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>true</AlwaysVisible>
            <UseDefault>false</UseDefault>
          </InfosFading>
        </Plane>
        <CrossMark id="25dda1e6-f474-4c51-a1bb-d17da0d3588a" name="Marker 1">
          <CenterPoint>1330.364;405</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>1281;283</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="370.36" UserXInvariant="370.36" UserY="135.00" UserYInvariant="135.00" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="9686f0e3-c2ae-4e61-b634-2bf49e29b16c" name="Marker 2">
          <CenterPoint>1266.545;336.2727</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>1217;214</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="306.55" UserXInvariant="306.55" UserY="203.73" UserYInvariant="203.73" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="e22268ef-0348-4aba-bacd-6830227fe877" name="Marker 3">
          <CenterPoint>1553.727;753.5455</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>1504;631</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="593.73" UserXInvariant="593.73" UserY="-213.55" UserYInvariant="-213.55" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="26bcf39d-75ea-4353-b26b-47bb86549a16" name="Marker 4">
          <CenterPoint>1445.727;758.4545</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>1396;636</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="485.73" UserXInvariant="485.73" UserY="-218.45" UserYInvariant="-218.45" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="d45e5715-09d0-4f98-b533-71f30fb23fb6" name="Marker 5">
          <CenterPoint>1620;844.3636</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>1571;722</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="660.00" UserXInvariant="660.00" UserY="-304.36" UserYInvariant="-304.36" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="96fbec04-de71-4a98-a55b-9f4f5d62b4bf" name="Marker 6">
          <CenterPoint>682.3636;635.7273</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>633;513</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="-277.64" UserXInvariant="-277.64" UserY="-95.73" UserYInvariant="-95.73" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="59209987-854f-4d6a-95e4-8d4a35bc1b70" name="Marker 7">
          <CenterPoint>684.8182;687.2727</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>635;565</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="-275.18" UserXInvariant="-275.18" UserY="-147.27" UserYInvariant="-147.27" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="47483b27-2d37-4448-bb39-3c43efa6d29b" name="Marker 8">
          <CenterPoint>552.2727;392.7273</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>503;270</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="-407.73" UserXInvariant="-407.73" UserY="147.27" UserYInvariant="147.27" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="02448ccf-5298-4244-b4ab-05c41f2df372" name="Marker 9">
          <CenterPoint>390.2727;481.0909</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>341;359</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="-569.73" UserXInvariant="-569.73" UserY="58.91" UserYInvariant="58.91" UserUnitLength="px" />
        </CrossMark>
        <CrossMark id="fb18c303-3e63-4c55-9000-80dc90bc65ba" name="Marker 10">
          <CenterPoint>260.1818;142.3636</CenterPoint>
          <ExtraData>None</ExtraData>
          <MeasureLabel>
            <SpacePosition>211;20</SpacePosition>
            <TimePosition>0</TimePosition>
          </MeasureLabel>
          <DrawingStyle>
            <Color Key="back color">
              <Value>255;0;0;0</Value>
            </Color>
          </DrawingStyle>
          <InfosFading>
            <Enabled>true</Enabled>
            <Frames>20</Frames>
            <AlwaysVisible>false</AlwaysVisible>
            <UseDefault>true</UseDefault>
          </InfosFading>
          <Coordinates UserX="-699.82" UserXInvariant="-699.82" UserY="397.64" UserYInvariant="397.64" UserUnitLength="px" />
        </CrossMark>
      </Drawings>
    </Keyframe>
  </Keyframes>
  <CoordinateSystem id="48cc63a9-8e28-4c15-bef5-6404756a50b1" name="Coordinate System 1">
    <Visible>false</Visible>
    <DrawingStyle>
      <Color Key="line color">
        <Value>255;255;0;0</Value>
      </Color>
    </DrawingStyle>
  </CoordinateSystem>
  <Trackability />
</KinoveaVideoAnalysis>
此外,简化的代码可以处理单个文件。此代码

Xdemo = []
Ydemo = []

onetree = et.parse("XML ex-KVA\World Cup of Pool 2018 SF Austria vs China B 11.xml")
root = onetree.getroot()
for kfs in root.iter('Keyframes'):
    root1 = et.Element('root')
    root1 = kfs
    for kf in root1.iter('Keyframe'):
        root2 = et.Element('root')
        root2 = kf
        for cm in root2.iter('CrossMark'):
            root3 = et.Element('root')
            root3 = cm
            for coord in root3.iter('Coordinates'):
                Xdemo.append(coord.attrib['UserX'])
                Ydemo.append(coord.attrib['UserY'])

dfDemo = pd.DataFrame({'X':Xdemo, 'Y':Ydemo})

print(dfDemo)
…给出了:

         X        Y
0  -243.27    93.27
1  -302.18    27.00
2  -535.36    71.18
3  -528.00   157.09
4  -429.82   196.36
5  -402.82   324.00
6    29.18   120.27
7   247.64   198.82
8   328.64  -100.64
9  -702.27  -328.91

我仍然是Python的新手,因此如果我的代码有任何其他问题,或者为了能够运行数据帧(一次10个坐标)并给出结果,可以提供更好的解决方案,我将非常感激。

解决方案似乎是添加
encoding=“utf-8”
到带有open的
语句,如带有open(filename'r',encoding=“utf-8”)的
中的当前文件:
。只有这样解析器才能工作。

显然,您应该更新代码以识别错误的XML文件,并具体查看它。我不明白您的意思。所有XML文件(将近一千个)的结构都是相同的,文件夹中没有非XML文件。所以我看不出文件的结构有什么问题,但你有一个问题。(我们不能相信您的评估,即当您同时告诉我们其中一个文件失败时,您的所有文件都是相同的。)为了我们和您的利益,请将问题的搜索空间减少到显示问题的最小数据和代码。创建并发布一个——注释最小且可复制(在我们这边)。
         X        Y
0  -243.27    93.27
1  -302.18    27.00
2  -535.36    71.18
3  -528.00   157.09
4  -429.82   196.36
5  -402.82   324.00
6    29.18   120.27
7   247.64   198.82
8   328.64  -100.64
9  -702.27  -328.91