Python解析XML并另存为txt
我有一个.xml文件文件夹,如下所示:Python解析XML并另存为txt,python,xml,Python,Xml,我有一个.xml文件文件夹,如下所示: <PubmedArticleSet> <PubmedArticle> <MedlineCitation Owner="NLM" Status="MEDLINE"> <PMID Version="1">23458631</PMID> <DateCreated> <Year>2013</Year>
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">23458631</PMID>
<DateCreated>
<Year>2013</Year>
<Month>04</Month>
<Day>08</Day>
</DateCreated>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Calcium</DescriptorName>
<QualifierName MajorTopicYN="Y">metabolism</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Calcium Chloride</DescriptorName>
<QualifierName MajorTopicYN="N">administration & dosage</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
</PubmedArticle>
<PubmedArticle>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">23458629</PMID>
<DateCreated>
<Year>2013</Year>
<Month>3</Month>
<Day>20</Day>
</DateCreated>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adolescent</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adult</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Anthropometry</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
</PubmedArticle>
</PubmedArticleSet>
使用元素树
此操作的输出为
23458631
23458629
一旦有了元素值,就可以构建字符串并将其写入文件。使用ElementTree
此操作的输出为
23458631
23458629
一旦有了元素值,就可以构建字符串并将其写入文件。开始
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
with open('my_text_file.txt', 'w') as f:
f.write('ArticleID|CreatedDate|MeSH|IsMajor\n')
for pubmed_article in root.findall('PubmedArticle'):
ArticleID = pubmed_article.find('MedlineCitation').find('PMID').text
year = pubmed_article.find('MedlineCitation').find('DateCreated').find('Year').text
month = pubmed_article.find('MedlineCitation').find('DateCreated').find('Month').text
day = pubmed_article.find('MedlineCitation').find('DateCreated').find('Day').text
CreatedDate = year + month + day
for mesh_heading in pubmed_article.find('MedlineCitation').find('MeshHeadingList').findall('MeshHeading'):
MeSH = mesh_heading.find('DescriptorName').text
IsMajor = mesh_heading.find('DescriptorName').get('MajorTopicYN')
line_to_write = ArticleID + '|' + CreatedDate + '|' + MeSH + '|' + IsMajor + '\n'
with open('my_text_file.txt', 'a') as f:
f.write(line_to_write)
这是输出文件
ArticleID|CreatedDate|MeSH|IsMajor
23458631|20130408|Animals|N
23458631|20130408|Calcium|N
23458631|20130408|Calcium Chloride|N
23458629|20130320|Adolescent|N
23458629|20130320|Adult|N
23458629|20130320|Anthropometry|N
给你
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
with open('my_text_file.txt', 'w') as f:
f.write('ArticleID|CreatedDate|MeSH|IsMajor\n')
for pubmed_article in root.findall('PubmedArticle'):
ArticleID = pubmed_article.find('MedlineCitation').find('PMID').text
year = pubmed_article.find('MedlineCitation').find('DateCreated').find('Year').text
month = pubmed_article.find('MedlineCitation').find('DateCreated').find('Month').text
day = pubmed_article.find('MedlineCitation').find('DateCreated').find('Day').text
CreatedDate = year + month + day
for mesh_heading in pubmed_article.find('MedlineCitation').find('MeshHeadingList').findall('MeshHeading'):
MeSH = mesh_heading.find('DescriptorName').text
IsMajor = mesh_heading.find('DescriptorName').get('MajorTopicYN')
line_to_write = ArticleID + '|' + CreatedDate + '|' + MeSH + '|' + IsMajor + '\n'
with open('my_text_file.txt', 'a') as f:
f.write(line_to_write)
这是输出文件
ArticleID|CreatedDate|MeSH|IsMajor
23458631|20130408|Animals|N
23458631|20130408|Calcium|N
23458631|20130408|Calcium Chloride|N
23458629|20130320|Adolescent|N
23458629|20130320|Adult|N
23458629|20130320|Anthropometry|N
以下是我的版本:
import xml.etree.ElementTree as ET
xml_path = r'Y:\Misc\stack_overflow\Python\xml_extract\data.xml'
output_file_path = 'output.txt'
f = open(output_file_path, 'wb')
f.write('ArticleID|CreatedDate|MeSH|IsMajor\n')
tree = ET.parse(xml_path)
root = tree.getroot()
for pa in root.iter('PubmedArticle'):
ArticleID = pa.find('MedlineCitation/PMID').text
CreatedDate = pa.find('MedlineCitation/DateCreated/Year').text+\
pa.find('MedlineCitation/DateCreated/Month').text.zfill(2)+\
pa.find('MedlineCitation/DateCreated/Day').text.zfill(2)
for mh in pa.iter('MeshHeading'):
DescriptorName = mh.find('DescriptorName').text
MajorTopicYN = mh.find('DescriptorName').attrib['MajorTopicYN']
f.write(ArticleID+'|'+CreatedDate+'|'+DescriptorName+'|'+MajorTopicYN+'\n')
f.close()
文件中的输出为:
ArticleID|CreatedDate|MeSH|IsMajor
23458631|20130408|Animals|N
23458631|20130408|Calcium|N
23458631|20130408|Calcium Chloride|N
23458629|20130320|Adolescent|N
23458629|20130320|Adult|N
23458629|20130320|Anthropometry|N
以下是我的版本:
import xml.etree.ElementTree as ET
xml_path = r'Y:\Misc\stack_overflow\Python\xml_extract\data.xml'
output_file_path = 'output.txt'
f = open(output_file_path, 'wb')
f.write('ArticleID|CreatedDate|MeSH|IsMajor\n')
tree = ET.parse(xml_path)
root = tree.getroot()
for pa in root.iter('PubmedArticle'):
ArticleID = pa.find('MedlineCitation/PMID').text
CreatedDate = pa.find('MedlineCitation/DateCreated/Year').text+\
pa.find('MedlineCitation/DateCreated/Month').text.zfill(2)+\
pa.find('MedlineCitation/DateCreated/Day').text.zfill(2)
for mh in pa.iter('MeshHeading'):
DescriptorName = mh.find('DescriptorName').text
MajorTopicYN = mh.find('DescriptorName').attrib['MajorTopicYN']
f.write(ArticleID+'|'+CreatedDate+'|'+DescriptorName+'|'+MajorTopicYN+'\n')
f.close()
文件中的输出为:
ArticleID|CreatedDate|MeSH|IsMajor
23458631|20130408|Animals|N
23458631|20130408|Calcium|N
23458631|20130408|Calcium Chloride|N
23458629|20130320|Adolescent|N
23458629|20130320|Adult|N
23458629|20130320|Anthropometry|N
看一看,看一看你有没有改变输入文件?据我所知,此代码将导致一些日期显示为2013320而不是20130320。您是否更改了输入文件?据我所知,这段代码将导致一些日期显示为2013320,而不是20130320。