Python 使用melt重新构造数据帧时出现KeyError
我有一个dataframe,目前看起来如下,有2628行和101列。我想将与数字Python 使用melt重新构造数据帧时出现KeyError,python,pandas,Python,Pandas,我有一个dataframe,目前看起来如下,有2628行和101列。我想将与数字0.08333 0.16666 0.249999等相关联的years行转换为一列: years Currency 0.08333333 0.16666666 0.24999999 0.33333332 \ 2005-01-04 GBP 4.709456 4.633861 4.586271 4.567017 2005-01-05 GBP 4.7
0.08333 0.16666 0.249999
等相关联的years
行转换为一列:
years Currency 0.08333333 0.16666666 0.24999999 0.33333332 \
2005-01-04 GBP 4.709456 4.633861 4.586271 4.567017
2005-01-05 GBP 4.713099 4.649220 4.606802 4.588313
2005-01-06 GBP 4.707237 4.646861 4.609294 4.593076
代码如下所示,其中组合_数据
是数据帧。我使用了melt
来执行此操作,但得到了错误keyrerror:“years'
,不知道如何处理此问题:
from pandas.io.excel import read_excel
import pandas as pd
import numpy as np
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
# check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8)
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
spot_curve.columns.name = 'years'
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
short_end_spot_curve.columns.name = 'years'
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
idx = 0
values = ['GBP'] * len(combined_data.index)
combined_data.insert(idx, 'Currency', values)
print combined_data
pd.melt(combined_data,id_vars=['years']) #ERROR!
这可能需要根据相对于列的行数进行调整,但会得到所需的结果(或多或少):
@约翰:我已经编辑了这个问题。是的,它是“年份行”,而不是列,我已经添加了期望的结果。酷,现在更清楚了。谢谢你的回答。我意识到我在想要输出结构的方式上犯了一个错误,但我接受了你的答案,因为它可以解决我在这里提出的问题。如果你想取消检查答案并重新做这个问题,我一点也不介意。虽然一般来说,写一个新问题会得到更好的回答,因为对旧问题的编辑经常被忽略(我想是这样的),但这没关系。如果你有兴趣的话,我已经重写了这个问题。
years Currency
0.08333333 2005-01-04 GBP 4.709456 4.633861 4.586271 4.567017
0.16666666 2005-01-05 GBP 4.713099 4.649220 4.606802 4.588313
0.24999999 2005-01-06 GBP 4.707237 4.646861 4.609294 4.593076
years Currency 0.08333333 0.16666666 0.24999999 0.33333332
0 2005-01-04 GBP 4.709456 4.633861 4.586271 4.567017
1 2005-01-05 GBP 4.713099 4.649220 4.606802 4.588313
2 2005-01-06 GBP 4.707237 4.646861 4.609294 4.593076
df['x'] = df.columns.values[-4:-1]
df = df.set_index('x',drop=True)
df.columns = ['years','Currency','v1','v2','v3','v4']
years Currency v1 v2 v3 v4
x
0.08333333 2005-01-04 GBP 4.709456 4.633861 4.586271 4.567017
0.16666666 2005-01-05 GBP 4.713099 4.649220 4.606802 4.588313
0.24999999 2005-01-06 GBP 4.707237 4.646861 4.609294 4.593076