Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中解析xml_Python_Xml_Pandas - Fatal编程技术网

在python中解析xml

在python中解析xml,python,xml,pandas,Python,Xml,Pandas,我希望通过以下xml进行解析-。我希望将结果放在一个包含3列的数据框中:日期、批准、不批准。xml文件是动态的,因为每天都会添加一个新的日期,所以代码应该考虑到这一点。我已经实现了一个静态的解决方案,即我必须循环给出值标记行数。我想学习如何动态地实现它 import numpy as np import pandas as pd import requests from pattern import web xml = requests.get('http://charts.realclear

我希望通过以下xml进行解析-。我希望将结果放在一个包含3列的数据框中:日期、批准、不批准。xml文件是动态的,因为每天都会添加一个新的日期,所以代码应该考虑到这一点。我已经实现了一个静态的解决方案,即我必须循环给出值标记行数。我想学习如何动态地实现它

import numpy as np
import pandas as pd
import requests
from pattern import web

xml = requests.get('http://charts.realclearpolitics.com/charts/1044.xml').text
dom = web.Element(xml)
values = dom.by_tag('value')

date = []
approve = []
disapprove = []

values = dom.by_tag('value')
#The last range number below is 1720 instead of 1727 as last 6 values of Approve & Disapprove tag are blank. 
for i in range(0,1720):
    date.append(pd.to_datetime(values[i].content))

#The last range number below is 3447 instead of 3454 as last 6 values are blank. Including till 3454 will give error while converting to float. 
for i in range(1727,3447):
    a = float(values[i].content)
    approve.append(a)

#The last range number below is 5174 instead of 5181 as last 6 values are blank.
for i in range(3454,5174):
    a = float(values[i].content)
    disapprove.append(a)

finalresult = pd.DataFrame({'date': date, 'Approve': approve, 'Disapprove': disapprove})
finalresult

下面是使用和XPath的一种方法:

from lxml import etree
import pandas as pd

tree = etree.parse("http://charts.realclearpolitics.com/charts/1044.xml")

date = [s.text for s in tree.xpath("series/value")]
approve = [float(s.text) if s.text else 0.0
           for s in tree.xpath("graphs/graph[@title='Approve']/value")]
disapprove = [float(s.text) if s.text else 0.0
              for s in tree.xpath("graphs/graph[@title='Disapprove']/value")]

assert len(date) == len(approve) == len(disapprove)

finalresult = pd.DataFrame({'Date': date, 'Approve': approve, 'Disapprove': disapprove})
print finalresult
输出:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1727 entries, 0 to 1726
Data columns (total 3 columns):
Date          1727  non-null values
Approve       1727  non-null values
Disapprove    1727  non-null values
dtypes: float64(2), object(1)

INT64索引:1727个条目,0到1726
数据列(共3列):
日期1727非空值
批准1727个非空值
不批准1727个非空值
数据类型:float64(2),object(1)

lxml支持xpath,这似乎是您想要的。然后,您可以使用xpath命令将元素取出,不管有多少个元素。谢谢您的代码。它解析得很好。此外,它还有1720个非空值。但它的末尾包含7个“None”值,这使得像finalresult.Approve.sum()这样的操作不可能?