在Python中强制将xml文件保存为xls格式_Python_Xml_Excel

在Python中强制将xml文件保存为xls格式

python xml excel

在Python中强制将xml文件保存为xls格式,python,xml,excel,Python,Xml,Excel,我这里有一段代码，以Excel 2004 xml格式下载该基金数据： import urllib2 url = 'https://www.ishares.com/us/258100/fund-download.dl' s = urllib2.urlopen(url) contents = s.read() file = open("export.xml", 'w') file.write(contents) file.close() 我的目标是以编程方式将此文件转换为.xls，然后通过该文件将

我这里有一段代码，以Excel 2004 xml格式下载该基金数据：

import urllib2
url = 'https://www.ishares.com/us/258100/fund-download.dl'
s = urllib2.urlopen(url)
contents = s.read()
file = open("export.xml", 'w')
file.write(contents)
file.close()

我的目标是以编程方式将此文件转换为.xls，然后通过该文件将其读入数据帧。我知道我可以使用python的xml库解析这个文件，但是，我注意到如果我打开xml文件并用xls文件扩展名手动保存它，pandas就可以读取它，并得到我想要的结果

我还尝试使用以下代码重命名文件扩展名，但是此方法不会强制保存该文件，而是将其保留为包含xls文件扩展名的基础xml文档

import os
import sys
folder = '~/models'
for filename in os.listdir(folder):
    if filename.startswith('export'):
        infilename = filename
        newname = infilename.replace('newfile.xls', 'f.xls')
        output = os.rename(infilename, newname)

使用Excel为Windows，考虑使用Python到COM连接到Excel对象库，使用WIN32 COM模块。具体而言，请使用Excel和方法将下载的xml另存为csv：

用Excel作为MAC，考虑VBA解决方案，因为VBA是最常用的语言，与Excel对象库接口。下面下载iSharesXML，然后将其保存为csv，以便使用OpenXML和SaveAs方法导入

注意：这未在Mac上测试，但希望Microsoft.XMLHTTP对象可用

VBA保存在启用宏的工作簿中

选项显式子下载XML 关于错误转到错误句柄将wb设置为工作簿 Dim xmlDoc作为对象 Dim xmlfile作为字符串，csvfile作为字符串 xmlfile=ActiveWorkbook.Path&\file.xml csvfile=ActiveWorkbook.Path&\file.csv 呼叫下载ilehttps://www.ishares.com/us/258100/fund-download.dl，xmlfile 设置wb=Excel.Workbooks.openxmlfile wb.SaveAs csvfile，6 wb.Close为真出口：设置wb=Nothing 设置xmlDoc=Nothing 出口接头错误句柄： MsgBox错误编号和错误说明，vbCritical 复出端接头函数DownloadFileurl为字符串，filePath为字符串 Dim WinHttpReq作为对象，oStream作为对象设置WinHttpReq=CreateObjectMicrosoft.XMLHTTP WinHttpReq.openget，url，False WinHttpReq.send 如果WinHttpReq.Status=200，则设置oStream=CreateObjectADODB.Stream 奥斯特雷姆，开门 oStream.Type=1 oStream.Write WinHttpReq.responseBody oStream.SaveToFile文件路径，2'1=不覆盖，2=覆盖奥斯特雷姆，完毕如果结束设置WinHttpReq=Nothing 设置oStream=Nothing 端函数蟒蛇

我发现我正在使用的网站开发了一个api，从而绕过了网络抓取。然后使用python的请求模块

... 如果我打开XMl文件并手动保存它-使用什么应用程序？擅长如果是Excel，并且不关心性能，则可以执行与现在使用python编写的OLE手动执行的转换相同的转换。@Sophos是的，使用Excel手动保存它。谢谢，我将研究oletoolsI之前应该指定的，我在mac os上运行此脚本，因此无法使用win32com客户端。我实际上知道，因为您引用了Excel 2004，所以没有这样的Windows版本。对于未来的读者来说，这可能会有所帮助。考虑构建相同的宏版本，并将Python称为命令行。

import os
import win32com.client as win32    
import requests as r
import pandas as pd

cd = os.path.dirname(os.path.abspath(__file__))

url = "http://www.ishares.com/us/258100/fund-download.dl"
xmlfile = os.path.join(cd, 'iSharesDownload.xml')
csvfile = os.path.join(cd, 'iSharesDownload.csv')

# DOWNLOAD FILE
try:
    rqpage = r.get(url)
    with open(xmlfile, 'wb') as f:
        f.write(rqpage.content)    
except Exception as e:
    print(e)    
finally:
    rqpage = None

# EXCEL COM TO SAVE EXCEL XML AS CSV
if os.path.exists(csvfile):
    os.remove(csvfile)
try:
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    wb = excel.Workbooks.OpenXML(xmlfile)
    wb.SaveAs(csvfile, 6)
    wb.Close(True)    
except Exception as e:
    print(e)    
finally:
    # RELEASES RESOURCES
    wb = None
    excel = None

# IMPORT CSV INTO PANDAS DATAFRAME
df = pd.read_csv(csvfile, skiprows=8)
print(df.describe())

#        Weight (%)       Price  Coupon (%)     YTM (%)  Yield to Worst (%)    Duration
# count  625.000000  625.000000  625.000000  625.000000          625.000000  625.000000
# mean     0.159888  101.298768    6.500256    5.881168            5.313760    2.128688
# std      0.126833   10.469460    1.932744    4.059226            4.224268    1.283360
# min     -0.110000    0.000000    0.000000    0.000000           -8.030000    0.000000
# 25%      0.090000  100.380000    5.130000    3.430000            3.070000    0.970000
# 50%      0.130000  102.940000    6.380000    4.930000            3.910000    2.240000
# 75%      0.190000  105.000000    7.630000    6.820000            6.070000    3.260000
# max      1.750000  128.750000   12.500000   40.900000           40.900000    5.060000

import pandas as pd

csvfile = "/path/to/file.csv"

# IMPORT CSV INTO PANDAS DATAFRAME
df = pd.read_csv(csvfile, skiprows=8)
print(df.describe())

#        Weight (%)       Price  Coupon (%)     YTM (%)  Yield to Worst (%)    Duration
# count  625.000000  625.000000  625.000000  625.000000          625.000000  625.000000
# mean     0.159888  101.298768    6.500256    5.881168            5.313760    2.128688
# std      0.126833   10.469460    1.932744    4.059226            4.224268    1.283360
# min     -0.110000    0.000000    0.000000    0.000000           -8.030000    0.000000
# 25%      0.090000  100.380000    5.130000    3.430000            3.070000    0.970000
# 50%      0.130000  102.940000    6.380000    4.930000            3.910000    2.240000
# 75%      0.190000  105.000000    7.630000    6.820000            6.070000    3.260000
# max      1.750000  128.750000   12.500000   40.900000           40.900000    5.060000

url = "https://www.blackrock.com/tools/hackathon/performance
for ticker in tickers:
    params = {'identifiers': ticker ,
              'returnsType':'MONTHLY'}
    request = requests.get(url, params=params)
    json = request.json()