如何在Python中进行xml解析?
我有一个.odf文件 我只想将href的Text/Chapter1.xhtml分开 我怎么做 这是样品如何在Python中进行xml解析?,python,xml,beautifulsoup,Python,Xml,Beautifulsoup,我有一个.odf文件 我只想将href的Text/Chapter1.xhtml分开 我怎么做 这是样品 <?xml version="1.0" encoding="utf-8"?> <package version="2.0" unique-identifier="BookId" xmlns="http:/pf"> <metadata xmlns:dc="http:ts/1.1/" xmlns:opf="ht200pf"> <dc:ident
<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="BookId" xmlns="http:/pf">
<metadata xmlns:dc="http:ts/1.1/" xmlns:opf="ht200pf">
<dc:identifier opf:scheme="ISBN" id="BookId">urn:19be</dc:identifier>
<dc:title>samplesample</dc:title>
<dc:creator />
<dc:language>ko</dc:language>
<meta name="cover" content="image" />
<meta content="0.9.18" name="Sigil version" />
<dc:date opf:event="modification" xmlns:opf="httopf">2019-12-12</dc:date>
</metadata>
<manifest>
<item id="tocncx" href="toc.ncx" media-type="application/xhtml+xml"/>
<item id="titlepage" href="Text/titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter1" href="Text/chapter1.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter2" href="Text/chapter2.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter3" href="Text/chapter3.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter4" href="Text/chapter4.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter5" href="Text/chapter5.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter6" href="Text/chapter6.xhtml" media-type="application/xhtml+xml"/>
</manifest>
<spine toc="tocncx">
<itemref idref="titlepage"/>
<itemref idref="chapter1"/>
<itemref idref="chapter2"/>
<itemref idref="chapter3"/>
<itemref idref="chapter4"/>
<itemref idref="chapter5"/>
<itemref idref="chapter6"/>
</spine>
</package>
我有一个.odf文件
我只想将href的Text/Chapter1.xhtml分开
我怎么做
这是样品
<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="BookId" xmlns="http:/pf">
<metadata xmlns:dc="http:ts/1.1/" xmlns:opf="ht200pf">
<dc:identifier opf:scheme="ISBN" id="BookId">urn:19be</dc:identifier>
<dc:title>samplesample</dc:title>
<dc:creator />
<dc:language>ko</dc:language>
<meta name="cover" content="image" />
<meta content="0.9.18" name="Sigil version" />
<dc:date opf:event="modification" xmlns:opf="httopf">2019-12-12</dc:date>
</metadata>
<manifest>
<item id="tocncx" href="toc.ncx" media-type="application/xhtml+xml"/>
<item id="titlepage" href="Text/titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter1" href="Text/chapter1.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter2" href="Text/chapter2.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter3" href="Text/chapter3.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter4" href="Text/chapter4.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter5" href="Text/chapter5.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter6" href="Text/chapter6.xhtml" media-type="application/xhtml+xml"/>
</manifest>
<spine toc="tocncx">
<itemref idref="titlepage"/>
<itemref idref="chapter1"/>
<itemref idref="chapter2"/>
<itemref idref="chapter3"/>
<itemref idref="chapter4"/>
<itemref idref="chapter5"/>
<itemref idref="chapter6"/>
</spine>
</package>
我不知道你想要什么
from simplified_scrapy import SimplifiedDoc,req,utils
html='''
<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="BookId" xmlns="http:/pf">
<metadata xmlns:dc="http:ts/1.1/" xmlns:opf="ht200pf">
<dc:identifier opf:scheme="ISBN" id="BookId">urn:19be</dc:identifier>
<dc:title>samplesample</dc:title>
<dc:creator />
<dc:language>ko</dc:language>
<meta name="cover" content="image" />
<meta content="0.9.18" name="Sigil version" />
<dc:date opf:event="modification" xmlns:opf="httopf">2019-12-12</dc:date>
</metadata>
<manifest>
<item id="tocncx" href="toc.ncx" media-type="application/xhtml+xml"/>
<item id="titlepage" href="Text/titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter1" href="Text/chapter1.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter2" href="Text/chapter2.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter3" href="Text/chapter3.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter4" href="Text/chapter4.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter5" href="Text/chapter5.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter6" href="Text/chapter6.xhtml" media-type="application/xhtml+xml"/>
</manifest>
<spine toc="tocncx">
<itemref idref="titlepage"/>
<itemref idref="chapter1"/>
<itemref idref="chapter2"/>
<itemref idref="chapter3"/>
<itemref idref="chapter4"/>
<itemref idref="chapter5"/>
<itemref idref="chapter6"/>
</spine>
</package>'''
doc = SimplifiedDoc(html)
hrefs = doc.manifest.selects('item').select('href()')
print (hrefs)
href = doc.manifest.select("item#chapter1>href()")
print (href)
item = doc.manifest.select("item#chapter1")
print (item)
我想把Text/chaper1~6.xhtml分开。很抱歉把你弄糊涂了。谢谢