从多个表中提取第一行并添加一列(Python)
我正在尝试从investment.com上生成一份最新货币报价列表 我有以下代码:从多个表中提取第一行并添加一列(Python),python,pandas,dataframe,Python,Pandas,Dataframe,我正在尝试从investment.com上生成一份最新货币报价列表 我有以下代码: head = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36", "X-Requested-With": "XMLHttpRequest"} ISO_Code=[] Latest=[] for item
head = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"}
ISO_Code=[]
Latest=[]
for item in ISO_CURR_ID.ISO_Code[:4]:
url = 'http://www.investing.com/currencies/usd-'+item+'-historical-data'
r = requests.get(url, headers=head)
soup = BeautifulSoup(r.content, 'html.parser')
try:
CurrHistoricRange = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0]
Item='USD/'+item
ISO_Code.append(np.array(Item))
# Latest.append(np.array(CurrHistoricRange[:1]))
Latest.append(CurrHistoricRange[:1])
except:
pass
其中ISO_CURR_ID.ISO_代码为:
In [69]:ISO_CURR_ID.ISO_Code[:4]
Out[69]:
0 EUR
1 GBP
2 JPY
3 CHF
我需要的最终格式是这样的表
ISO_Code Date Price Open High Low Change %
0 EUR Jun 21, 2016, 0.8877, 0.8833, 0.8893, 0.881, -0.14%
但是我很难理解如何在不重复列名的情况下合并这些第一行。因此,如果我使用
Final=pd.DataFrame(dict(ISO_Code = ISO_Code, Latest_Quotes = Latest))
Final
Out[71]:
ISO_Code Latest_Quotes
0 USD/EUR Date Price Open High Low...
1 USD/GBP Date Price Open High Lo...
2 USD/JPY Date Price Open High Low...
3 USD/CHF Date Price Open High Low...
我认为这是一个更干净的方式来完成你想做的事情
head = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"}
latest_data=[]
for item in ISO_CURR_ID.ISO_Code:
url = 'http://www.investing.com/currencies/usd-'+item+'-historical-data'
r = requests.get(url, headers=head)
soup = BeautifulSoup(r.content, 'html.parser')
try:
CurrHistoricRange = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0]
Item='USD/'+item
data = CurrHistoricRange.iloc[0].to_dict()
data["ISO_Code"] = Item
latest_data.append(data)
except Exception as e:
print(e)
def getDf(latest_list, order = ["ISO_Code", "Date", "Price", "Open", "High", "Low", "Change %"]):
return pd.DataFrame(latest_list, columns=order)
getDf(latest_data)
产出:
ISO_Code Date Price Open High Low Change %
0 USD/EUR Jun 21, 2016 0.8882 0.8833 0.8893 0.8810 0.55%
1 USD/GBP Jun 21, 2016 0.6822 0.6815 0.6829 0.6766 0.10%
2 USD/JPY Jun 21, 2016 104.75 103.82 104.82 103.60 0.88%
3 USD/CHF Jun 21, 2016 0.9613 0.9620 0.9623 0.9572 -0.07%
我建议您使用类似于pandas_datareader的:
import requests
from bs4 import BeautifulSoup
import pandas as pd
head = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
ISO_Code=[]
Latest=[]
URL = 'http://www.investing.com/currencies/usd-{}-historical-data'
dfs = {}
curr_ser = pd.Series(['EUR','GBP','JPY','CHF'])
#for item in ISO_CURR_ID.ISO_Code[:4]:
for item in curr_ser:
url = URL.format(item)
r = requests.get(url, headers=head)
soup = BeautifulSoup(r.content, 'html.parser')
try:
Item='USD/'+item
dfs[Item] = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0]
#CurrHistoricRange = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0]
#ISO_Code.append(np.array(Item))
#Latest.append(np.array(CurrHistoricRange[:1]))
#Latest.append(CurrHistoricRange[:1])
except:
pass
# create Panel out of dictionary of DataFrames
p = pd.Panel(dfs)
# slice first row from all DFs
t = p[:,0,:]
print(t)
print(t.T)
输出:
USD/CHF USD/EUR USD/GBP USD/JPY
Date Jun 21, 2016 Jun 21, 2016 Jun 21, 2016 Jun 21, 2016
Price 0.9618 0.8887 0.6828 104.97
Open 0.962 0.8833 0.6815 103.82
High 0.9623 0.8893 0.6829 104.97
Low 0.9572 0.881 0.6766 103.6
Change % -0.02% 0.61% 0.19% 1.09%
Date Price Open High Low Change %
USD/CHF Jun 21, 2016 0.9618 0.962 0.9623 0.9572 -0.02%
USD/EUR Jun 21, 2016 0.8887 0.8833 0.8893 0.881 0.61%
USD/GBP Jun 21, 2016 0.6828 0.6815 0.6829 0.6766 0.19%
USD/JPY Jun 21, 2016 104.97 103.82 104.97 103.6 1.09%
如果我们对DF的索引(按日期)进行如下排序:
dfs[Item] = pd.read_html(r.content,
attrs = {'id': 'curr_table'},
flavor="bs4",
parse_dates=['Date'],
index_col=[0]
)[0].sort_index()
# create Panel out of dictionary of DataFrames
p = pd.Panel(dfs)
现在我们可以做很多有趣的事情:
In [18]: p.axes
Out[18]:
[Index(['USD/CHF', 'USD/EUR', 'USD/GBP', 'USD/JPY'], dtype='object'),
DatetimeIndex(['2016-05-23', '2016-05-24', '2016-05-25', '2016-05-26', '2016-05-27', '2016-05-30', '2016-05-31', '2016-06-01', '2016-06-02', '20
16-06-03', '2016-06-06', '2016-06-07', '2016-06-08',
'2016-06-09', '2016-06-10', '2016-06-13', '2016-06-14', '2016-06-15', '2016-06-16', '2016-06-17', '2016-06-19', '2016-06-20', '20
16-06-21'],
dtype='datetime64[ns]', name='Date', freq=None),
Index(['Price', 'Open', 'High', 'Low', 'Change %'], dtype='object')]
In [19]: p.keys()
Out[19]: Index(['USD/CHF', 'USD/EUR', 'USD/GBP', 'USD/JPY'], dtype='object')
In [22]: p.to_frame().head(10)
Out[22]:
USD/CHF USD/EUR USD/GBP USD/JPY
Date minor
2016-05-23 Price 0.9896 0.8913 0.6904 109.23
Open 0.9905 0.8913 0.6893 110.08
High 0.9924 0.8942 0.6925 110.25
Low 0.9879 0.8893 0.6872 109.08
Change % -0.06% 0.03% 0.12% -0.84%
2016-05-24 Price 0.9933 0.8976 0.6833 109.99
Open 0.9892 0.891 0.6903 109.22
High 0.9946 0.8983 0.6911 110.12
Low 0.9882 0.8906 0.6827 109.14
Change % 0.37% 0.71% -1.03% 0.70%
按货币对和日期编制索引
In [25]: p['USD/EUR', '2016-06-10':'2016-06-15', :]
Out[25]:
Price Open High Low Change %
Date
2016-06-10 0.8889 0.8835 0.8893 0.8825 0.59%
2016-06-13 0.8855 0.8885 0.8903 0.8846 -0.38%
2016-06-14 0.8922 0.8856 0.8939 0.8846 0.76%
2016-06-15 0.8881 0.892 0.8939 0.8848 -0.46%
按货币对列出的指数
In [26]: p['USD/EUR', :, :]
Out[26]:
Price Open High Low Change %
Date
2016-05-23 0.8913 0.8913 0.8942 0.8893 0.03%
2016-05-24 0.8976 0.891 0.8983 0.8906 0.71%
2016-05-25 0.8964 0.8974 0.8986 0.8953 -0.13%
2016-05-26 0.8933 0.8963 0.8975 0.8913 -0.35%
2016-05-27 0.8997 0.8931 0.9003 0.8926 0.72%
2016-05-30 0.8971 0.8995 0.9012 0.8969 -0.29%
2016-05-31 0.8983 0.8975 0.8993 0.8949 0.13%
2016-06-01 0.8938 0.8981 0.9 0.8929 -0.50%
2016-06-02 0.8968 0.8937 0.8974 0.8911 0.34%
2016-06-03 0.8798 0.8968 0.8981 0.8787 -1.90%
2016-06-06 0.8807 0.8807 0.8831 0.8777 0.10%
2016-06-07 0.8804 0.8805 0.8821 0.8785 -0.03%
2016-06-08 0.8777 0.8803 0.8812 0.8762 -0.31%
2016-06-09 0.8837 0.877 0.8847 0.8758 0.68%
2016-06-10 0.8889 0.8835 0.8893 0.8825 0.59%
2016-06-13 0.8855 0.8885 0.8903 0.8846 -0.38%
2016-06-14 0.8922 0.8856 0.8939 0.8846 0.76%
2016-06-15 0.8881 0.892 0.8939 0.8848 -0.46%
2016-06-16 0.8908 0.8879 0.8986 0.8851 0.30%
2016-06-17 0.8868 0.8907 0.8914 0.885 -0.45%
2016-06-19 0.8813 0.8822 0.8841 0.8811 -0.63%
2016-06-20 0.8833 0.8861 0.8864 0.8783 0.23%
2016-06-21 0.8891 0.8833 0.8893 0.881 0.66%
按日期索引
In [28]: p[:, '2016-06-20', :]
Out[28]:
USD/CHF USD/EUR USD/GBP USD/JPY
Price 0.962 0.8833 0.6815 103.84
Open 0.9599 0.8861 0.6857 104.63
High 0.9633 0.8864 0.6881 104.84
Low 0.9576 0.8783 0.6794 103.78
Change % 0.22% 0.23% -0.61% -0.75%
In [29]: p[:, :, 'Change %']
Out[29]:
USD/CHF USD/EUR USD/GBP USD/JPY
Date
2016-05-23 -0.06% 0.03% 0.12% -0.84%
2016-05-24 0.37% 0.71% -1.03% 0.70%
2016-05-25 -0.20% -0.13% -0.42% 0.18%
2016-05-26 -0.20% -0.35% 0.18% -0.38%
2016-05-27 0.55% 0.72% 0.31% 0.42%
2016-05-30 -0.25% -0.29% -0.07% 0.82%
2016-05-31 0.14% 0.13% 1.10% -0.38%
2016-06-01 -0.55% -0.50% 0.42% -1.07%
2016-06-02 0.23% 0.34% -0.04% -0.61%
2016-06-03 -1.45% -1.90% -0.66% -2.14%
2016-06-06 -0.56% 0.10% 0.55% 0.97%
2016-06-07 -0.55% -0.03% -0.71% -0.19%
2016-06-08 -0.62% -0.31% 0.28% -0.35%
2016-06-09 0.55% 0.68% 0.30% 0.10%
2016-06-10 0.02% 0.59% 1.42% -0.10%
2016-06-13 -0.03% -0.38% -0.11% -0.68%
2016-06-14 -0.11% 0.76% 1.11% -0.13%
2016-06-15 -0.21% -0.46% -0.64% -0.08%
2016-06-16 0.40% 0.30% 0.03% -1.67%
2016-06-17 -0.54% -0.45% -1.08% -0.12%
2016-06-19 0.00% -0.63% -1.55% 0.48%
2016-06-20 0.22% 0.23% -0.61% -0.75%
2016-06-21 0.02% 0.66% 0.35% 0.98%
双轴索引
In [30]: p[:, '2016-06-10':'2016-06-15', 'Change %']
Out[30]:
USD/CHF USD/EUR USD/GBP USD/JPY
Date
2016-06-10 0.02% 0.59% 1.42% -0.10%
2016-06-13 -0.03% -0.38% -0.11% -0.68%
2016-06-14 -0.11% 0.76% 1.11% -0.13%
2016-06-15 -0.21% -0.46% -0.64% -0.08%
你能发布一个想要的输出吗(5-7行就足够了)?想要的输出是个问题