需要帮助以Python解析xml文件吗

需要帮助以Python解析xml文件吗,python,parsing,xml-parsing,xlm,Python,Parsing,Xml Parsing,Xlm,我有一个包含许多xml存档的文件夹,下面我将其称为xlmstr: <?xml version="1.0"?> <case> <name>Sharman Networks Ltd v Universal Music Australia Pty Ltd [2006] FCA 1 (5 January 2006)</name> <AustLII>http://www.austlii.edu.au/au/case

我有一个包含许多
xml
存档的文件夹,下面我将其称为
xlmstr

    <?xml version="1.0"?>
    <case>
    <name>Sharman Networks Ltd v Universal Music Australia Pty Ltd [2006] FCA 1 (5 January 2006)</name>
    <AustLII>http://www.austlii.edu.au/au/cases/cth/FCA/2006/1.html</AustLII>
    <citations>
    <citation "id=c0">

    <class>cited</class>
    <tocase>Universal Music Australia Pty Ltd v Sharman License Holdings Ltd (2005) 220 ALR 1</tocase>
    <text>2 Wilcox J delivered judgment on the complex issues of liability arising in the primary proceedings on 5 September 2005 ( Universal Music Australia Pty Ltd v Sharman License Holdings Ltd (2005) 220 ALR 1). In the meantime, Ms Hemming had filed two disclosure affidavits pursuant to Wilcox J's orders of 22 March 2005 whilst Sharman License and Sharman Networks had unsuccessfully sought several stays on various grounds of that same order insofar as it applied to them (see Universal Music Australia Pty Ltd v Sharman License Holdings Ltd [2005] FCA 406 per Hely J, delivered 8 April 2005; Universal Music Australia Pty Ltd v Sharman License Holdings Ltd [2005] FCA 441 per Wilcox J, delivered 15 April 2005 and Sharman License Holdings Ltd v Universal Music Australia Pty Ltd [2005] FCA 505 per Moore J, delivered 28 April 2005). Disclosure affidavits were eventually sworn on behalf of Sharman License and Sharman Networks by Mr Gee on 19 April 2005, which were later superseded by further affidavits sworn also by Mr Gee on 16 June 2005. Sharman License and Sharman Networks had also unsuccessfully sought an enlargement of time in which to file an application for leave to appeal from Wilcox J's orders of 22 March 2005 (see Sharman License Holdings Ltd v Universal Music Australia Pty Ltd [2005] FCA 802 per Lindgren J, delivered on 17 June 2005).</text>
    </citation>
    <citation "id=c1">
    <class>cited</class>
<text>2 Wilcox J delivered judgment on the complex issues of liability arising in the primary proceedings on 5 September 2005 ( Universal Music Australia Pty Ltd v Sharman License Holdings Ltd (2005) 220 ALR 1). In the meantime, Ms Hemming had filed two disclosure affidavits pursuant to Wilcox J's orders of 22 March 2005 whilst Sharman License and Sharman Networks had unsuccessfully sought several stays on various grounds of that same order insofar as it applied to them (see Universal Music Australia Pty Ltd v Sharman License Holdings Ltd [2005] FCA 406 per Hely J, delivered 8 April 2005; Universal Music Australia Pty Ltd v Sharman License Holdings Ltd [2005] FCA 441 per Wilcox J, delivered 15 April 2005 and Sharman License Holdings Ltd v Universal Music Australia Pty Ltd [2005] FCA 505 per Moore J, delivered 28 April 2005). Disclosure affidavits were eventually sworn on behalf of Sharman License and Sharman Networks by Mr Gee on 19 April 2005, which were later superseded by further affidavits sworn also by Mr Gee on 16 June 2005. Sharman License and Sharman Networks had also unsuccessfully sought an enlargement of time in which to file an application for leave to appeal from Wilcox J's orders of 22 March 2005 (see Sharman License Holdings Ltd v Universal Music Australia Pty Ltd [2005] FCA 802 per Lindgren J, delivered on 17 June 2005).

24 All that was referrable of course to the implications of the payment of $1,116,405.63 by Ms Hemming to TIL, following the sale of her Sydney residence on 4 February 2005; that payment appears to have been made out of the proceeds of a sale of that residence, which was effected for the gross price of $2,100,000 to a person identified by the evidence as an accountant of certain of the Sharman companies. There was no sufficiently detailed or otherwise cogent evidence as to who exercised the substantial or underlying control of decision making of TIL, or as to the basis of or reasons for such alleged indebtedness having crystallised in the first place. The state of the evidence as to the control of TIL was itself the subject of disputation before Moore J and senior counsel for the Sharman applicants sought to attribute error to his Honour's judgment for the further reason that he had failed to make a finding as to Ms Hemming's control, or otherwise, of that entity. The Sharman applicants postulated that the 'remark' made by Lindgren J at [13] of his Honour's reasons for judgment in Sharman License Holdings Ltd v Universal Music Australia Pty Ltd [2005] FCA 802 that '[Wilcox J] accepted [in the course of granting the Mareva relief on 22 March 2005] that the Sharman Companies were controlled by Ms Hemming by reason of a "client services agreement" between her and TIL dated 8 April 2002' was an 'unsure foundation for any finding of control of the Sharman trust or the Sharman companies [by Ms Hemming]', and was thus inappropriately or impermissibly relied upon by Moore J in formulating his reasons for judgment. That submission lacked merit, particularly in the light of [31] of Lindgren J's reasons for judgment in which his Honour paraphrased the two-fold acceptance, given in cross-examination by the solicitor acting for Sharman License and Sharman Networks in their application before Lindgren J, that TIL as trustee of the Sharman trust was the ultimate beneficial owner of all the shares issued in Sharman License and Sharman Networks, and moreover that Wilcox J had himself appeared to accept that in consequence of the client services agreement, Ms Hemming 'controlled the Sharman trust'.

25 The Music companies had submitted to Moore J that given the evidentiary shortcomings on a subject readily susceptible to documentary demonstration, inclusive of banking records I might add, there was in truth and reality no antecedent loan, that the transfer of those funds by Ms Hemming to TIL in Vanuatu constituted a sham transaction, and consequently that those monies remained her own property beneficially, and should have been identified and disclosed as such in her affidavit provided in the Mareva context. Once more, so it was asserted by the Sharman applicants, his Honour declined to make any concluded finding on the subject. The point is however that his Honour had been able to infer from the surrounding circumstances I have already outlined that there was some force in the Music companies' submission. But in any event his Honour was of the view that he could permit cross-examination of Ms Hemming on and in relation to those matters because at least doubt existed in relation to that area of enquiry.</text>
</citation>
<citation "id=c5">
<class>cited</class>
<tocase>D&eacute;cor Corporation Pty Ltd v Dart Industries Inc (1991) 33 FCR 397</tocase>
<text>6 Section 24(1A) of the Federal Court of Australia Act 1976 (Cth) stipulates that an appeal shall not be brought from a judgment of the Court constituted by a single judge, being a judgment that is interlocutory in nature, unless the Court or a Judge gives leave to appeal. Although s 24(1A) does not purport to qualify or limit the Court's discretion (see D&eacute;cor Corporation Pty Ltd v Dart Industries Inc (1991) 33 FCR 397 at 399 in the joint reasons for judgment of Sheppard, Burchett and Heerey JJ), the Courts have developed general principles which inform the exercise of the discretion to refuse or grant leave to appeal from an interlocutory judgment. The rationale for those principles is the public interest in the efficient administration of justice, and the maintenance of 'the integrity and vigour of the procedures of the court, including as they do, the immediate involvement of the judge at all stages in the progress of cases to trial' ( Bomanite Pty Ltd v Slatex Corp Australia Pty Ltd (1991) 104 ALR 165 at 173, per Gummow J). One consequence sought to be avoided is the expansion of expensive and delaying pre-trial litigation involved in appeals on issues of practice and procedure, and the concomitant reduction in the authority of the trial judge, should such appeals be frequently entertained ( Bomanite at 176, per French J).

 "...I am of the opinion that...there is a material difference between an exercise of discretion on a point of practice or procedure and an exercise of discretion which determines substantive rights. In the former class of case, if a tight rein were not kept upon interference with the orders of Judges of first instance, the result would be disastrous to the proper administration of justice. The disposal of cases could be delayed interminably, and costs heaped up indefinitely, if a litigant with a long purse or a litigious disposition could, at will, in effect transfer all exercises of discretion in interlocutory applications from a Judge in chambers to a Court of Appeal."



 ...It is safe to say that the question of injustice flowing from the order appealed from will generally be a relevant and necessary consideration.'</text>
</citation>
<citation "id=c6">
<class>cited</class>
<tocase>Bomanite Pty Ltd v Slatex Corp Australia Pty Ltd (1991) 104 ALR 165</tocase>
<text>6 Section 24(1A) of the Federal Court of Australia Act 1976 (Cth) stipulates that an appeal shall not be brought from a judgment of the Court constituted by a single judge, being a judgment that is interlocutory in nature, unless the Court or a Judge gives leave to appeal. Although s 24(1A) does not purport to qualify or limit the Court's discretion (see D&eacute;cor Corporation Pty Ltd v Dart Industries Inc (1991) 33 FCR 397 at 399 in the joint reasons for judgment of Sheppard, Burchett and Heerey JJ), the Courts have developed general principles which inform the exercise of the discretion to refuse or grant leave to appeal from an interlocutory judgment. The rationale for those principles is the public interest in the efficient administration of justice, and the maintenance of 'the integrity and vigour of the procedures of the court, including as they do, the immediate involvement of the judge at all stages in the progress of cases to trial' ( Bomanite Pty Ltd v Slatex Corp Australia Pty Ltd (1991) 104 ALR 165 at 173, per Gummow J). One consequence sought to be avoided is the expansion of expensive and delaying pre-trial litigation involved in appeals on issues of practice and procedure, and the concomitant reduction in the authority of the trial judge, should such appeals be frequently entertained ( Bomanite at 176, per French J)'.</text>
</citation>
<citation "id=c7">
<class>cited</class>

    <tocase>Adam P Brown Male Fashions Proprietary Limited v Phillip Morris Incorporated [1981] HCA 39 ; (1981) 148 CLR 170</tocase>
    <AustLII>http://www.austlii.edu.au/au/cases//cth/HCA/1981/39.html</AustLII>
    <text>7 At least for those reasons, this Court has held on a number of occasions that typically a party seeking leave to appeal from an interlocutory judgment ought to establish, first, that in all the circumstances, the decision from which leave is sought to appeal is attended with sufficient doubt to warrant the same being reconsidered by the Full Court, and secondly, that substantial injustice would result if such leave was to be refused, supposing the decision to have been wrong: see D&eacute;cor at 398. That those two questions were the touchstone of exercise of discretion in matters of this kind was common ground between the parties. I observe that it is well accepted that those criteria are not to be applied rigidly or fixedly, and the Court must bear in mind all of the circumstances of the particular case: see in that regard Adam P Brown Male Fashions Proprietary Limited v Phillip Morris Incorporated [1981] HCA 39 ; (1981) 148 CLR 170 at 177, where Gibbs CJ, Aickin, Wilson and Brennan JJ said:

     'For ourselves, we believe it to be unnecessary and indeed unwise to lay down rigid and exhaustive criteria. The circumstances of different cases are infinitely various. We would merely repeat, with approval, the oft-cited statement of Sir Frederick Jordan in re the Will of F B Gilbert (dec) (1946) 46 SR (NSW) 318 at 323: 



     "...I am of the opinion that...there is a material difference between an exercise of discretion on a point of practice or procedure and an exercise of discretion which determines substantive rights. In the former class of case, if a tight rein were not kept upon interference with the orders of Judges of first instance, the result would be disastrous to the proper administration of justice. The disposal of cases could be delayed interminably, and costs heaped up indefinitely, if a litigant with a long purse or a litigious disposition could, at will, in effect transfer all exercises of discretion in interlocutory applications from a Judge in chambers to a Court of Appeal."



     ...It is safe to say that the question of injustice flowing from the order appealed from will generally be a relevant and necessary consideration.'</text>
    </citation>

<citation "id=c16">
<class>cited</class>
<tocase>Cardile v LED Builders Pty Ltd [1999] HCA 18 ; (1999) 198 CLR 380</tocase>
<AustLII>http://www.austlii.edu.au/au/cases//cth/HCA/1999/18.html</AustLII>
<text>27 My reading of his Honour's reasons here was that he was far from satisfied with the nature or extent of the purported offshore structures and transactions to the extent apparent from the evidence, involving as they did the creation of a trust estate somewhat cognate to what have often been described as 'blind trusts'. Concerns of that nature appear to have persuaded or assisted to persuade the primary judge of the need to order that Ms Hemming submit to cross-examination on her disclosure affidavits. In determining to take that approach, his Honour paid regard to the relevant authorities dealing with both the grant of Mareva relief, and the making of orders ancillary to the same, including orders requiring the swearing of disclosure affidavits and cross-examination on those affidavits. After reviewing the relevant principles enunciated in those authorities, his Honour concluded at [28]: 

 '...ultimately the cautionary words of the four members of the High Court in [ Cardile v LED Builders Pty Ltd [1999] HCA 18 ; (1999) 198 CLR 380 at 403-404] set out at [18] above must be heeded. Orders made in the Court's ancillary jurisdiction must be founded on a doctrinal and principled basis. A Mareva order is protective of the Court's processes, including the efficacy of execution of those orders. Orders concerning disclosure affidavits and cross examination can, in turn, be made to render the Mareva order more efficacious. This is the touchstone for determining whether leave should be given to cross examine. A relevant consideration in determining whether leave should be given might, in an appropriate case, be the failure of the deponent of a disclosure affidavit to disclose assets completely or promptly or both. In such a case, leave might be given because doubts might arise about whether the deponent had understood and accepted the obligations and burdens imposed by the Mareva order and the ancillary order requiring the disclosure affidavit. Cross examination might be appropriate to test whether the disclosure affidavits fully revealed all assets on which the Mareva order operated and which might be available to satisfy any judgment. However, in other cases, other more significant factors might support the granting of leave to cross examine.' 

31 In my opinion, and for the reasons I have largely foreshadowed in my observations upon the submissions already recorded, the application for leave to appeal brought by the Sharman applicants has not sufficient cogency to justify the grant of any such leave. The case of the Music companies presented to the primary judge (Moore J) for relief of the nature and to the extent granted was sufficiently in line with established principle as to be clear from 'sufficient doubt'. I do not think that the United Kingdom and Australian authorities establish inflexible requirements to the extent postulated by the Sharman applicants, in particular concerning the Court's jurisdiction to grant leave to cross-examine the deponents of disclosure affidavits in Mareva contexts. His Honour's approach in particular to the issue of granting leave to the Music companies to cross-examine Ms Hemming was soundly justified in the light of the evidentiary circumstances concerning the Sharman applicants' offshore trust structure, and the circumstances of and context in which such a substantial sum of money was transferred to an offshore company in the amount and in the context that occurred.</text>
</citation>
</citations>
</case>
在执行代码时,我得到以下错误:

xml.etree.ElementTree.ParseError: XML or text declaration not at start of entity: line 2, column 4
我认为这是因为xml文件的第一行“`” 但即使我删除它,错误仍然存在。你能帮我吗?如有任何建议,将不胜感激


谢谢

问题是XML中有一些特殊字符:

请尝试使用以下代码:

import re

scrubbedXML = re.sub('&.+[0-9]+;', '', xmlstr)
scrubbedXML = re.sub('&eacute;', '', scrubbedXML)

root = ET.fromstring(scrubbedXML)
levels = root.findall('.//text')
for level in levels:
    print (level.text)

问题是XML中有一些特殊字符:

请尝试使用以下代码:

import re

scrubbedXML = re.sub('&.+[0-9]+;', '', xmlstr)
scrubbedXML = re.sub('&eacute;', '', scrubbedXML)

root = ET.fromstring(scrubbedXML)
levels = root.findall('.//text')
for level in levels:
    print (level.text)

您的XML字符串格式不正确。首先,您需要从开始的XML声明中删除换行符,如下所示:

xmlstr = """<?xml version="1.0"?>
xmlstr=”“”
(记住用
结束
xmlstr
多行注释,类似于我们在上面开始字符串捕获的方式)

其次,您需要更改XML属性,如


否则将出现格式错误的XML异常。

您的XML字符串格式不正确。首先,您需要从开始的XML声明中删除换行符,如下所示:

xmlstr = """<?xml version="1.0"?>
xmlstr=”“”
(记住用
结束
xmlstr
多行注释,类似于我们在上面开始字符串捕获的方式)

其次,您需要更改XML属性,如


否则,将出现格式错误的XML异常。

首先,您必须修复
“id=c01”
它应该是
id=“c01”
无处不在:

clean_string = xmlstring.replace('"id=', 'id="')
然后,您需要取消对该html实体的浏览

import html
clean_string = html.unescape(clean_string)   
最后,您必须手动或仅使用
.strip()
删除开头的空白,请注意,您还必须将
find('text')
替换为
find('.//text')
——它将在任何嵌套级别上查找
text
。或者,您可以只指定整个“路由”到文本

root = ET.fromstring(clean_string.strip())
content = root.find('.//text').text
print(content)
以下是查找单个文本的全部代码:

xmlstring = """ YOUR XML HERE """
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

clean_string = xmlstring.replace('"id=', 'id="')
clean_string = html.unescape(clean_string)
root = ET.fromstring(clean_string.strip())
content = root.find('.//text').text
print(content)
但我假设您希望从给定的xml文件/字符串中查找所有
文本,因此您可以这样做:

import html
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET
clean_string = xmlstring.replace('"id=', 'id="')
clean_string = html.unescape(clean_string)
root = ET.fromstring(clean_string.strip())
for content in root.findall('.//text'):
    print(content.text)

首先,您必须修复
“id=c01”
它应该是
id=“c01”
所有地方:

clean_string = xmlstring.replace('"id=', 'id="')
然后,您需要取消对该html实体的浏览

import html
clean_string = html.unescape(clean_string)   
最后,您必须手动或仅使用
.strip()
删除开头的空白,请注意,您还必须将
find('text')
替换为
find('.//text')
——它将在任何嵌套级别上查找
text
。或者,您可以只指定整个“路由”到文本

root = ET.fromstring(clean_string.strip())
content = root.find('.//text').text
print(content)
以下是查找单个文本的全部代码:

xmlstring = """ YOUR XML HERE """
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

clean_string = xmlstring.replace('"id=', 'id="')
clean_string = html.unescape(clean_string)
root = ET.fromstring(clean_string.strip())
content = root.find('.//text').text
print(content)
但我假设您希望从给定的xml文件/字符串中查找所有
文本,因此您可以这样做:

import html
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET
clean_string = xmlstring.replace('"id=', 'id="')
clean_string = html.unescape(clean_string)
root = ET.fromstring(clean_string.strip())
for content in root.findall('.//text'):
    print(content.text)

这不是有效的XML。如果你眯着眼睛看,它看起来有点像写得很糟糕的HTML。进入,BeautifulSoup,一个旨在清理最丑陋网页的软件包。因为我不想担心编码问题,所以我让BS在原始文件上运行,而不是将其读入字符串:

>>> from bs4 import BeautifulSoup
>>> with open("crud.xml", "rb") as fp:
...     soup = BeautifulSoup(fp)
... 
>>> text_nodes = soup.findAll("text")
>>> len(text_nodes)
6
>>> text_nodes[0]
<text>2 Wilcox J delivered judgment on the complex issues of...
>>来自bs4导入组
>>>以open(“crud.xml”、“rb”)作为fp:
...     汤=美汤(fp)
... 
>>>text_nodes=soup.findAll(“text”)
>>>len(文本节点)
6.
>>>文本_节点[0]
2 Wilcox J对……的复杂问题做出了判断。。。

这是无效的XML。如果你眯着眼睛看,它看起来有点像写得很糟糕的HTML。进入,BeautifulSoup,一个旨在清理最丑陋网页的软件包。因为我不想担心编码问题,所以我让BS在原始文件上运行,而不是将其读入字符串:

>>> from bs4 import BeautifulSoup
>>> with open("crud.xml", "rb") as fp:
...     soup = BeautifulSoup(fp)
... 
>>> text_nodes = soup.findAll("text")
>>> len(text_nodes)
6
>>> text_nodes[0]
<text>2 Wilcox J delivered judgment on the complex issues of...
>>来自bs4导入组
>>>以open(“crud.xml”、“rb”)作为fp:
...     汤=美汤(fp)
... 
>>>text_nodes=soup.findAll(“text”)
>>>len(文本节点)
6.
>>>文本_节点[0]
2 Wilcox J对……的复杂问题做出了判断。。。

é
是一个有效的实体引用(拉丁文大写字母E带有锐重音),我没有看到任何实体中有数字。我不知道这会有什么帮助。我测试了这个,它成功了@tdelaney-Other解决方案也在取代&only,这只会导致更多的歧义。该实体是有效的XML,采用标准HTML格式-诀窍是将此文档视为格式不良的HTML,而不是XML。
é
是一个有效的实体引用(拉丁文大写字母E带有锐重音),我没有看到任何实体中有数字。我不知道这会有什么帮助。我测试了这个,它成功了@tdelaney-Other解决方案也在取代&only,这只会导致更多的歧义。该实体是有效的XML,采用标准HTML格式-诀窍是将此文档视为格式不良的HTML,而不是XML。
é
是一个有效的html实体OOPS,你是对的,我只是取消了它,而不是删除它
é
是一个有效的html实体OOPS,你是对的,我只是取消了它,而不是删除它