Python 用lxml解析lotusnotes剪贴板链接

Python 用lxml解析lotusnotes剪贴板链接,python,xml,lxml,lotus-notes,lxml.html,Python,Xml,Lxml,Lotus Notes,Lxml.html,我试图解析一个LotusNotes文档链接(取自剪贴板),将其转换为Notes://URL/URI。从剪贴板选项来看,从文本格式获取数据似乎是更容易的转换方式。然而,该链接看起来像一个格式非常糟糕的XML,并且lxml在解析时正在丢失信息 data = """Name - Enc: Injeção <NDL> <REPLICA 83257B7B:00608A81> <VIEW OFDCBCE5C7:007D345D-ON882572F4:00650240> &

我试图解析一个LotusNotes文档链接(取自剪贴板),将其转换为
Notes://
URL/URI。从剪贴板选项来看,从文本格式获取数据似乎是更容易的转换方式。然而,该链接看起来像一个格式非常糟糕的XML,并且lxml在解析时正在丢失信息

data = """Name - Enc: Injeção
<NDL>
<REPLICA 83257B7B:00608A81>
<VIEW OFDCBCE5C7:007D345D-ON882572F4:00650240>
<NOTE OFD18FCA06:36A9EDA2-ON83257F6A:004E31C1>
<HINT>CN=SERV101/OU=RJ/OU=C/O=Company</HINT>
<REM>Database 'Name', View 'Inbox', Document 'Enc: Injeção'</REM>
</NDL>
"""
from lxml import html, etree
title, ndl = html.fragments_fromstring(data)
replica = ndl[0]
view = replica[0]
print replica.attrib
print view.attrib
print html.tostring(ndl)
data=”“”名称-附录:Injeço
CN=SERV101/OU=RJ/OU=C/O=Company
数据库“名称”、视图“收件箱”、文档“Enc:Injeção”
"""
从lxml导入html,etree
标题,ndl=html.fragments\u fromstring(数据)
副本=ndl[0]
视图=副本[0]
打印副本.attrib
打印视图.attrib
打印html.tostring(ndl)
这张照片是:

{}
{'ofdcbce5c7:007d345d-on882572f4:00650240': ''}
<ndl>
<replica>
<view ofdcbce5c7:007d345d-on882572f4:00650240>
<note ofd18fca06:36a9eda2-on83257f6a:004e31c1>
<hint>CN=SERV101/OU=RJ/OU=C/O=Company</hint>
<rem>Database 'Name', View 'Inbox', Document 'Enc: Inje&#195;&#167;&#195;&#163;o'</rem>
</note></view></replica></ndl>
{}
{'OFDCCE5C7:007d345d-on882572f4:00650240':'''
CN=SERV101/OU=RJ/OU=C/O=Company
数据库“名称”、视图“收件箱”、文档“Enc:Injeção'
因此,我正在丢失
REPLICA
标记中的信息,尽管我仍然从
视图
中获得了一些信息(我怀疑连字符可能会在这里起作用)

那么,有没有办法用lxml获取所有数据,或者我必须恢复到RegExp

环境信息:

  • Windows 7,64位
  • Python 2.7.11 | Anaconda 2.4.1(32位)
  • LXML 3.4.4
您可能会发现:

data=”“”名称-附录:Injeço
CN=SERV101/OU=RJ/OU=C/O=Company
数据库“名称”、视图“收件箱”、文档“Enc:Injeção”
"""
从lxml.etree导入fromstring,HTMLParser
xml=fromstring(数据,HTMLParser())
r=xml.xpath(“//副本”)
从bs4导入BeautifulSoup
soup=BeautifulSoup(数据,“html.parser”)
title=下一个(soup.find(“ndl”)。上一个元素)
印刷品(标题)
打印(soup.find(“副本”).attrs)
打印(soup.find(“查看”))
这给了你:

Name - Enc: Injeção

{u'83257b7b:00608a81': ''}
view ofdcbce5c7:007d345d-on882572f4:00650240="">
<note ofd18fca06:36a9eda2-on83257f6a:004e31c1="">
<hint>CN=SERV101/OU=RJ/OU=C/O=Company</hint>
<rem>Database 'Name', View 'Inbox', Document 'Enc: Injeção'</rem>
</note></view>
Name-Enc:Injeço
{u'83257b7b:00608a81':'''
CBCE5C7:007d345d-on882572f4:00650240=“”>
CN=SERV101/OU=RJ/OU=C/O=Company
数据库“名称”、视图“收件箱”、文档“Enc:Injeção”

您使用的是什么版本的Lotus Notes?我的印象是,最近的版本应该把notes://URL作为一种替代格式放在剪贴板上。(想想看,这可能只适用于拖放操作。如何将数据放在剪贴板上。)LotusNotes8.5.3。我通过右键单击文档并选择“复制为文档链接”来复制链接。
Name - Enc: Injeção

{u'83257b7b:00608a81': ''}
view ofdcbce5c7:007d345d-on882572f4:00650240="">
<note ofd18fca06:36a9eda2-on83257f6a:004e31c1="">
<hint>CN=SERV101/OU=RJ/OU=C/O=Company</hint>
<rem>Database 'Name', View 'Inbox', Document 'Enc: Injeção'</rem>
</note></view>