将XML解析为数据帧
我在解析一些XML时遇到一些问题。这就是XML的样子将XML解析为数据帧,xml,pandas,parsing,Xml,Pandas,Parsing,我在解析一些XML时遇到一些问题。这就是XML的样子 <listing> <seller_info> <seller_name> cubsfantony</seller_name> <seller_rating> 848</seller_rating> </seller_info> <payment_types>Visa/MasterCard, Money
<listing>
<seller_info>
<seller_name> cubsfantony</seller_name>
<seller_rating> 848</seller_rating>
</seller_info>
<payment_types>Visa/MasterCard, Money Order/Cashiers Checks, Personal Checks, See item description for payment methods accepted
</payment_types>
<shipping_info>Buyer pays fixed shipping charges, Will ship to United States only
</shipping_info>
<buyer_protection_info>
</buyer_protection_info>
<auction_info>
<current_bid>$620.00 </current_bid>
<time_left> 4 days, 14 hours + </time_left>
<high_bidder>
<bidder_name> gosha555@excite.com </bidder_name>
<bidder_rating>-2 </bidder_rating>
</high_bidder>
<num_items>1 </num_items>
<num_bids> 12</num_bids>
<started_at>$1.00 </started_at>
<bid_increment> </bid_increment>
<location> USA/Chicago</location>
<opened> Nov-27-00 04:57:50 PST</opened>
<closed> Dec-02-00 04:57:50 PST</closed>
<id_num> 511601118</id_num>
<notes> </notes>
</auction_info>
<bid_history>
<highest_bid_amount>$620.00 </highest_bid_amount>
<quantity> 1</quantity>
</bid_history>
<item_info>
<memory> 256MB PC133 SDram</memory>
<hard_drive> 30 GB 7200 RPM IDE Hard Drive</hard_drive>
<cpu>Pentium III 933 System </cpu>
<brand> </brand>
<description> NEW Pentium III 933 System - 133 MHz BUS Speed Pentium Motherboard.....
</description>
</item_info>
</listing>
这里,我只给出一个如何解析一个给定列表的示例。如果有多个列表,可以使用for循环遍历所有列表
from lxml import etree
listing = etree.parse('ebay.xml')
d = {}
for e in listing.getchildren():
for c in e.getchildren():
if len(c.getchildren()) == 0:
if c.tag is not None:
d[c.tag] = c.text
else:
for ce in c.getchildren():
if ce.tag is not None:
d[ce.tag] = ce.text
从这里,您可以将d附加到列表中,然后使用pandas将其转换为数据帧
输出如下所示
{'bid_increment': ' ',
'bidder_name': ' gosha555@excite.com ',
'bidder_rating': '-2 ',
'brand': ' ',
...
'seller_name': ' cubsfantony',
'seller_rating': ' 848',
'started_at': '$1.00 ',
'time_left': ' 4 days, 14 hours + '}
{'bid_increment': ' ',
'bidder_name': ' gosha555@excite.com ',
'bidder_rating': '-2 ',
'brand': ' ',
...
'seller_name': ' cubsfantony',
'seller_rating': ' 848',
'started_at': '$1.00 ',
'time_left': ' 4 days, 14 hours + '}