Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x XML到CSV Python_Python 3.x_Pandas_Csv_Beautifulsoup_Xml.etree - Fatal编程技术网

Python 3.x XML到CSV Python

Python 3.x XML到CSV Python,python-3.x,pandas,csv,beautifulsoup,xml.etree,Python 3.x,Pandas,Csv,Beautifulsoup,Xml.etree,状态的XML数据(file.XML)如下所示 <?xml version="1.0" encoding="UTF-8" standalone="true"?> <Activity_Logs xsi:schemaLocation="http://www.cisco.com/PowerKEYDVB/Auditing DailyActivityLog.xsd" To="2018-04-01" From="2018-04-01" xmlns:xsi="http://www.w3.or

状态的XML数据(file.XML)如下所示

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Activity_Logs xsi:schemaLocation="http://www.cisco.com/PowerKEYDVB/Auditing 
DailyActivityLog.xsd" To="2018-04-01" From="2018-04-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cisco.com/PowerKEYDVB/Auditing">
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>10</pkgEid>
       <pkgName>BBCWRL</pkgName>
       </ActivityRecord>
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>18</pkgEid>
       <pkgName>CNNINT</pkgName>
    </ActivityRecord>

我使用的代码没有提供CSV格式的任何数据。有人能告诉我哪里出了问题吗?

使用
pandas
beautifulsou
您可以轻松实现预期的输出:

#Code:

import pandas as pd
import itertools
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
pkgeid =  [ values.text for values in soup.findAll("pkgeid")]
pkgname = [ values.text for values in soup.findAll("pkgname")]
time =  [ values.text for values in soup.findAll("time")]
oper =  [ values.text for values in soup.findAll("oper")]
# For python-3.x use `zip_longest` method
# For python-2.x use 'izip_longest method
data = [item for item in itertools.zip_longest(time, oper, pkgeid, pkgname)] 
df  = pd.DataFrame(data=data)
df.to_csv("sample.csv",index=False, header=None)


找到了最合适的方法:

import os
import pandas as pd
from bs4 import BeautifulSoup as b

with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
df1 = pd.DataFrame()

for each_file in files_xlm: 
    with open( each_file, "r") as f: # opening xml file
        content = f.read()
    soup = b(content, "lxml")    

    list1 = []
    for values in soup.findAll("activityrecord"):  
        if values.find("time") is None:
            time = ""
        else:
            time = values.find("time").text        
        if values.find("oper") is None:
            oper = ""    
        else:
            oper = values.find("oper").text      
        if values.find("pkgeid") is None:
            pkgeid = ""    
        else:
            pkgeid = values.find("pkgeid").text     
        if values.find("pkgname") is None:
            pkgname = ""    
        else:
            pkgname = values.find("pkgname").text 
        if values.find("dhct") is None:
            dhct = ""    
        else:
            dhct = values.find("dhct").text   
        if values.find("sourceid") is None:
            sourceid = ""    
        else:
            sourceid = values.find("sourceid").text      
    
        list1.append(time+','+ oper+','+pkgeid+','+ pkgname+','+dhct+','+sourceid)
        df = pd.DataFrame(list1)



df=df[0].str.split(',', expand=True)
df.columns = ['Time','Oper','PkgEid','PkgName','dhct','sourceid']
df.to_csv("new.csv",index=False)

使用Pandas解析所有xml字段

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse("file.xml")
root = tree.getroot()

get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]

df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')

如果pyxmlparser是一次性操作,请使用它

免责声明我是该图书馆的作者,它是相当新的。任何反馈都将不胜感激。它是一个命令行实用程序


要从xml文件中提取哪些元素?请指定更多详细信息。time、oper、pkgEid、pkgName是我要提取的元素。我获取此错误AttributeError:“NoneType”对象没有属性“text”,因为在XML中有时“pkgEid”可能为空。如果为空,则字段将为空。因此,请检查文件名,在我的代码中,它是不同的文件名,尽管我更改了它2018-04-01T03:30:28Z停用\u Dhct 18:55:0F:47:03:2D让我马上试试Vishnuoww,您没有提供使用示例
import os
import pandas as pd
from bs4 import BeautifulSoup as b

with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
df1 = pd.DataFrame()

for each_file in files_xlm: 
    with open( each_file, "r") as f: # opening xml file
        content = f.read()
    soup = b(content, "lxml")    

    list1 = []
    for values in soup.findAll("activityrecord"):  
        if values.find("time") is None:
            time = ""
        else:
            time = values.find("time").text        
        if values.find("oper") is None:
            oper = ""    
        else:
            oper = values.find("oper").text      
        if values.find("pkgeid") is None:
            pkgeid = ""    
        else:
            pkgeid = values.find("pkgeid").text     
        if values.find("pkgname") is None:
            pkgname = ""    
        else:
            pkgname = values.find("pkgname").text 
        if values.find("dhct") is None:
            dhct = ""    
        else:
            dhct = values.find("dhct").text   
        if values.find("sourceid") is None:
            sourceid = ""    
        else:
            sourceid = values.find("sourceid").text      
    
        list1.append(time+','+ oper+','+pkgeid+','+ pkgname+','+dhct+','+sourceid)
        df = pd.DataFrame(list1)



df=df[0].str.split(',', expand=True)
df.columns = ['Time','Oper','PkgEid','PkgName','dhct','sourceid']
df.to_csv("new.csv",index=False)
import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse("file.xml")
root = tree.getroot()

get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]

df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')