Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
获取两个闭合标记之间的文本XML-Python_Python_Xml - Fatal编程技术网

获取两个闭合标记之间的文本XML-Python

获取两个闭合标记之间的文本XML-Python,python,xml,Python,Xml,我下载了我的Foursquare数据,它是KML格式的。我正在用Python将其作为XML文件进行解析,无法理解如何获取closed a标记和closed description标记之间的文本。(这是我在签入时键入的文本,在下面的示例中是“FINALLY HERE!!With Sonya and co”,但还有一个连字符) 这是数据外观的一个示例 <Placemark> <name>hummus grill</name> <description

我下载了我的Foursquare数据,它是KML格式的。我正在用Python将其作为XML文件进行解析,无法理解如何获取closed a标记和closed description标记之间的文本。(这是我在签入时键入的文本,在下面的示例中是“FINALLY HERE!!With Sonya and co”,但还有一个连字符)

这是数据外观的一个示例

<Placemark>
  <name>hummus grill</name>
  <description>@<a href="https://foursquare.com/v/hummus-grill/4aab4f71f964a520625920e3">hummus grill</a>- FINALLY HERE!! With Sonya and co</description>
  <updated>Tue, 24 Jan 12 17:14:00 +0000</updated>
  <published>Tue, 24 Jan 12 17:14:00 +0000</published>
  <visibility>1</visibility>
  <Point>
    <extrude>1</extrude>
    <altitudeMode>relativeToGround</altitudeMode>
    <coordinates>-75.20104383595685,39.9528387056977</coordinates>
  </Point>
</Placemark>
我试过这个(下面是数据的开头,有这个标题的东西,还没有弄明白如何处理它)

dom.getElementsByTagName('description')中d的
:
description.append(d.firstChild.data.encode('utf-8'))
foursquare签入历史foursquare签入历史:

然后通过这个d.firstChild.nextSibling.firstChild.data.encode('utf-8')访问它,但它只给了我“hummus grill”,我假设它是a标记之间的文本(而不是来自name标记)。

你试过使用子字符串吗

例如,假设所有xml都在变量“foo”中

foo = '<description>@<a href="https://foursquare.com/v/hummus-grill/4aab4f71f964a520625920e3">hummus grill</a>- FINALLY HERE!! With Sonya and co</description>'

只要读懂子字符串,你就能更容易地处理文本。

你试过使用子字符串吗

例如,假设所有xml都在变量“foo”中

foo = '<description>@<a href="https://foursquare.com/v/hummus-grill/4aab4f71f964a520625920e3">hummus grill</a>- FINALLY HERE!! With Sonya and co</description>'

只要仔细阅读子字符串,就能更轻松地处理文本。

以下内容对我很有用:

In [44]: description = []

In [45]: for d in dom.getElementsByTagName('description'):
   ....:     description.append(d.firstChild.nextSibling.nextSibling.data.encode('utf-8'))
   ....:     

In [46]: description
Out[46]: ['- FINALLY HERE!! With Sonya and co']
或者,如果要在描述标记中显示整个文本:

from xml.dom.minidom import parse, parseString

def getText(node, recursive = False):
    """ 
    Get all the text associated with this node.
    With recursive == True, all text from child nodes is retrieved
    """
    L = ['']
    for n in node.childNodes:
        if n.nodeType in (dom.TEXT_NODE, dom.CDATA_SECTION_NODE):
            L.append(n.data)
        else:
            if not recursive:
                return None
        L.append(getText(n))
    return ''.join(L)

dom = parseString("""<Placemark>
  <name>hummus grill</name>
  <description>@<a href="https://foursquare.com/v/hummus-grill/4aab4f71f964a520625920e3">hummus grill</a>- FINALLY HERE!! With Sonya and co</description>
  <updated>Tue, 24 Jan 12 17:14:00 +0000</updated>
  <published>Tue, 24 Jan 12 17:14:00 +0000</published>
  <visibility>1</visibility>
  <Point>
    <extrude>1</extrude>
    <altitudeMode>relativeToGround</altitudeMode>
    <coordinates>-75.20104383595685,39.9528387056977</coordinates>
  </Point>
</Placemark>""")

description = []

for d in dom.getElementsByTagName('description'):
    description.append(getText(d, recursive = True))

print description
从xml.dom.minidom导入解析,解析字符串
def getText(节点,递归=False):
""" 
获取与此节点关联的所有文本。
当recursive==True时,将检索子节点中的所有文本
"""
L=['']
对于node.childNodes中的n:
如果n.nodeType位于(dom.TEXT\u节点、dom.CDATA\u节\u节点):
附加(n.数据)
其他:
如果不是递归的:
一无所获
L.append(getText(n))
返回“”。加入(L)
dom=parseString(“”)
鹰嘴豆泥烤架
@-终于来了!索尼娅和他的同事们
1月24日星期二17:14:00+0000
1月24日星期二17:14:00+0000
1.
1.
相对地
-75.20104383595685,39.9528387056977
""")
description=[]
对于dom.getElementsByTagName('description')中的d:
description.append(getText(d,recursive=True))
打印说明

这将打印:
[u'@hummus grill-最后在这里!!与Sonya和co']
以下作品适合我:

In [44]: description = []

In [45]: for d in dom.getElementsByTagName('description'):
   ....:     description.append(d.firstChild.nextSibling.nextSibling.data.encode('utf-8'))
   ....:     

In [46]: description
Out[46]: ['- FINALLY HERE!! With Sonya and co']
或者,如果要在描述标记中显示整个文本:

from xml.dom.minidom import parse, parseString

def getText(node, recursive = False):
    """ 
    Get all the text associated with this node.
    With recursive == True, all text from child nodes is retrieved
    """
    L = ['']
    for n in node.childNodes:
        if n.nodeType in (dom.TEXT_NODE, dom.CDATA_SECTION_NODE):
            L.append(n.data)
        else:
            if not recursive:
                return None
        L.append(getText(n))
    return ''.join(L)

dom = parseString("""<Placemark>
  <name>hummus grill</name>
  <description>@<a href="https://foursquare.com/v/hummus-grill/4aab4f71f964a520625920e3">hummus grill</a>- FINALLY HERE!! With Sonya and co</description>
  <updated>Tue, 24 Jan 12 17:14:00 +0000</updated>
  <published>Tue, 24 Jan 12 17:14:00 +0000</published>
  <visibility>1</visibility>
  <Point>
    <extrude>1</extrude>
    <altitudeMode>relativeToGround</altitudeMode>
    <coordinates>-75.20104383595685,39.9528387056977</coordinates>
  </Point>
</Placemark>""")

description = []

for d in dom.getElementsByTagName('description'):
    description.append(getText(d, recursive = True))

print description
从xml.dom.minidom导入解析,解析字符串
def getText(节点,递归=False):
""" 
获取与此节点关联的所有文本。
当recursive==True时,将检索子节点中的所有文本
"""
L=['']
对于node.childNodes中的n:
如果n.nodeType位于(dom.TEXT\u节点、dom.CDATA\u节\u节点):
附加(n.数据)
其他:
如果不是递归的:
一无所获
L.append(getText(n))
返回“”。加入(L)
dom=parseString(“”)
鹰嘴豆泥烤架
@-终于来了!索尼娅和他的同事们
1月24日星期二17:14:00+0000
1月24日星期二17:14:00+0000
1.
1.
相对地
-75.20104383595685,39.9528387056977
""")
description=[]
对于dom.getElementsByTagName('description')中的d:
description.append(getText(d,recursive=True))
打印说明

这将打印:
[u'@hummus grill-最后在这里!!与Sonya和co']

那么我需要将DOM元素转换为子字符串吗?或者你是在建议一条完全不同的路线?是的。将整个DOM元素设置为一个变量将使您能够轻松地返回并分离某些部分。子字符串往往是解析文本的一种简单方法。那么我需要将DOM元素转换为子字符串吗?或者你是在建议一条完全不同的路线?是的。将整个DOM元素设置为一个变量将使您能够轻松地返回并分离某些部分。子字符串往往是解析文本的一种简单方法。
from xml.dom.minidom import parse, parseString

def getText(node, recursive = False):
    """ 
    Get all the text associated with this node.
    With recursive == True, all text from child nodes is retrieved
    """
    L = ['']
    for n in node.childNodes:
        if n.nodeType in (dom.TEXT_NODE, dom.CDATA_SECTION_NODE):
            L.append(n.data)
        else:
            if not recursive:
                return None
        L.append(getText(n))
    return ''.join(L)

dom = parseString("""<Placemark>
  <name>hummus grill</name>
  <description>@<a href="https://foursquare.com/v/hummus-grill/4aab4f71f964a520625920e3">hummus grill</a>- FINALLY HERE!! With Sonya and co</description>
  <updated>Tue, 24 Jan 12 17:14:00 +0000</updated>
  <published>Tue, 24 Jan 12 17:14:00 +0000</published>
  <visibility>1</visibility>
  <Point>
    <extrude>1</extrude>
    <altitudeMode>relativeToGround</altitudeMode>
    <coordinates>-75.20104383595685,39.9528387056977</coordinates>
  </Point>
</Placemark>""")

description = []

for d in dom.getElementsByTagName('description'):
    description.append(getText(d, recursive = True))

print description