在Python3中解析大型xml文件
我是python新手,我正在寻找一个快速实现,用以下模板解析大xml文件(~0.5-1 G):在Python3中解析大型xml文件,python,python-3.x,xml-parsing,lxml,Python,Python 3.x,Xml Parsing,Lxml,我是python新手,我正在寻找一个快速实现,用以下模板解析大xml文件(~0.5-1 G): <timestep time="2.00"> <vehicle id="carflow.0" x="-9897.274589" y="-8.250000" speed="49.840822" lane="section1_0" /> .... (more vehicles) </timestep> ... (more timesteps) 有没有办
<timestep time="2.00">
<vehicle id="carflow.0" x="-9897.274589" y="-8.250000" speed="49.840822" lane="section1_0" />
.... (more vehicles)
</timestep>
... (more timesteps)
有没有办法改进我的代码
def parseXML(filename):
df = pd.DataFrame()
old_time = 0.0
time = 0.0
events = ("end","start")
tree = ET.iterparse(filename, events=events)
for event, elem in tree:
if elem.tag == "timestep" and event =="start":
time = float(elem.attrib.get('time'))
elif elem.tag == "timestep" and event =="end":
elem.clear()
elif elem.tag == 'vehicle' and event=="end":
id = int(elem.attrib.get('id').split('.')[1])
x = float(elem.attrib.get('x'))
y = float(elem.attrib.get('y'))
speed = float(elem.attrib.get('speed'))
lane = int(elem.attrib.get('lane').split('_')[1])
data = pd.DataFrame([time, id, x, y, speed, lane]).T
elem.clear()
df = df.append(data)
if time%50 == 0 and time!=old_time:
old_time = time
print(time)
df.columns = ['time','id','x','y','speed','lane']
return df