Python中XML到XLSX的转换

Python中XML到XLSX的转换,xml,xls,Xml,Xls,我到处寻找答案,但似乎没有一个确定的答案。下面是: from selenium import webdriver chromedriver_path = ("localchromedrive/chromedriver.exe") chromeOptions = webdriver.ChromeOptions() MSCI_dir = ("mylocaldrive") prefs = {"download.default_directory" : MSCI_dir} chromeOptions.

我到处寻找答案,但似乎没有一个确定的答案。下面是:

from selenium import webdriver

chromedriver_path = ("localchromedrive/chromedriver.exe")
chromeOptions = webdriver.ChromeOptions()
MSCI_dir = ("mylocaldrive")
prefs = {"download.default_directory" : MSCI_dir}
chromeOptions.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chromedriver_path,chrome_options=chromeOptions)
url = "https://www.ishares.com/us/239637/fund-download.dl"
driver.get(url)
该文件现在下载到本地路径并保存为以下格式:

temp_path = "mylocaldrive\iShares-MSCI-Emerging-Markets-ETF_fund.xls"
此文件保存为“.xls”文件类型,但显然是XML文件。有关在记事本中打开的文件,请参见下文

我试过xlrd:

import xlrd
book = xlrd.open_workbook(temp_path)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf<?xml'
这看起来很有效,但当我尝试使用熊猫时,我发现:

pd.read_excel(xlsx_path)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf<?xml'`
pd.read\u excel(xlsx\u路径)

XLRDError:不支持的格式或损坏的文件:预期的BOF记录;找到了b'\xef\xbb\xbf我认为您的问题在于该文件不是XLS文件,而是XLSX文件,它是Microsoft为减小文档和XLS文件大小而制作的特殊XML文件

看:


遇到了同样的问题。最后,我不得不将该文件作为XML文件读取,并将XML重建为xlsx文件。你应该看看这个帖子:
wb = xw.Book(temp_path)
wb.save(xlsx_path)
wb.close()`
pd.read_excel(xlsx_path)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf<?xml'`
from bs4 import BeautifulSoup`
soup = BeautifulSoup(open(temp_path), "xml")`

In [1]: soup
Out[1]: <?xml version="1.0" encoding="utf-8"?>`

In [2]: soup.contents
Out[2]: []`

In [3]: soup.get_text()
Out[3]: ''`