使用Python解析XML时处理多个节点_Python_Mysql_Xml

使用Python解析XML时处理多个节点

python mysql xml

使用Python解析XML时处理多个节点,python,mysql,xml,Python,Mysql,Xml,对于作业，我需要解析一个200万行的XML文件，并将数据输入MySQL数据库。由于我们使用的是python环境和sqlite作为类，所以我尝试使用python来解析该文件。请记住，我只是在学习python，所以一切都是新的我尝试过几次，但都失败了，而且越来越沮丧。为了提高效率，我仅在少量完整XML上测试代码，如下所示： <pub> <ID>7</ID> <title>On the Correlation of Image Size to Sys

对于作业，我需要解析一个200万行的XML文件，并将数据输入MySQL数据库。由于我们使用的是python环境和sqlite作为类，所以我尝试使用python来解析该文件。请记住，我只是在学习python，所以一切都是新的

我尝试过几次，但都失败了，而且越来越沮丧。为了提高效率，我仅在少量完整XML上测试代码，如下所示：

<pub>
<ID>7</ID>
<title>On the Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems</title>
<year>2003</year>
<booktitle>AVBPA</booktitle>
<pages>895-902</pages>
<authors>
    <author>J. K. Schneider</author>
    <author>C. E. Richardson</author>
    <author>F. W. Kiefer</author>
    <author>Venu Govindaraju</author>
</authors>
</pub>

请注意，此处获取的是第一作者的字符数，因为代码将结果限制为仅第一作者（索引0），然后获取其长度：

author = authors.getElementsByTagName("author")[0].firstChild.data
num_authors = len(author)
print("Number of authors: ", num_authors )

只是不要限制结果以获得所有作者：

author = authors.getElementsByTagName("author")
num_authors = len(author)
print("Number of authors: ", num_authors )

您可以使用列表理解来获取列表中的所有作者姓名，而不是作者元素：

author = [a.firstChild.data for a in authors.getElementsByTagName("author")]
print(author)
# [u'J. K. Schneider', u'C. E. Richardson', u'F. W. Kiefer', u'Venu Govindaraju']

我知道我需要访问数组中的每个变量，但不确定语法。非常感谢你！嘿@har07，所以我取得了进展，但我的一些XML数据在某种意义上是“坏的”。。。我在名称中有一个带有特殊字符的条目，如“í”，并在XML文件中显示为“í；”。如何将这些特殊语言字符处理为python？我得到的错误是“ExpatError:UndefinedEntity:”。

author = [a.firstChild.data for a in authors.getElementsByTagName("author")]
print(author)
# [u'J. K. Schneider', u'C. E. Richardson', u'F. W. Kiefer', u'Venu Govindaraju']