Python 将数据帧附加到索引

Python 将数据帧附加到索引,python,pandas,python-3.5,Python,Pandas,Python 3.5,我的数据中有很多嵌套。我有6个时间段(但不必担心),每个时间段有19个分位数,每个分位数有一个51x51协方差矩阵(适用于美国的所有州和DC)。如果用字典表示,我会: my_data = {'time_pd_1' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states), {0.10 : pd.DataFrame(data=cov_var(d

我的数据中有很多嵌套。我有6个时间段(但不必担心),每个时间段有19个分位数,每个分位数有一个51x51协方差矩阵(适用于美国的所有州和DC)。如果用字典表示,我会:

my_data = {'time_pd_1' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
                         {0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
                          ...
                         {0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
                         {0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)},
           'time_pd_2' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
                         {0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
                          ...
                         {0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
                         {0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)},
            ...
           'time_pd_6' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
                         {0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
                          ...
                         {0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
                         {0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)}}
很简单,但是数据不是这样创建的。我有两个
for
循环来完成这项工作:

for tpd in time_periods:
    for q in quantiles:
        tdf = pd.DataFrame(data=cov_var(data_for_q), index=states, columns=states)
如果我要打印
tdf
,它看起来是这样的:

ST              Alabama         Alaska          Arizona         ...     West Virginia   Wisconsin   Wyoming
ST                                                                                                             
Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
...             ...             ...             ...             ...     ...             ...         ...
West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
现在,我想要的是:

cov = {}
for tpd in time_periods:
    cov[tpd] = pd.DataFrame(index=[str(round(q,2)) for q in quantiles])
    for q in quantiles:
        tdf = pd.DataFrame(data=cov_var(data_for_q), index=states, columns=states)
        cov[tpd].loc[str(round(q,2)), :] = tdf
因此,如果我打印
cov[tpd]
它应该如下所示:

        ST              Alabama         Alaska          Arizona         ...     West Virginia   Wisconsin   Wyoming
q       ST                                                                                                             
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.05    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.10    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
...     ...             ...             ...             ...             ...     ...             ...         ...
...     ...             ...             ...             ...             ...     ...             ...         ...
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.90    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.95    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
有了这样的最终结构,我的生活会轻松得多,我愿意为得到它的人买一杯啤酒。除此之外,我还尝试了各种方法:

cov[tpd].loc[str(round(q,2)), :] = tdf # Raises ValueError: Incompatible indexer with DataFrame
cov[tpd].loc[str(round(q,2)), :].append(tdf) # Almost gives me the frame I need, but removes the index level q, and inserts a column 0 with NaNs
cov[tpd].loc[str(round(q,2)), :].join(tdf, how='outer') # Raises AttributeError: 'Series' object has no attribute 'join'
pd.merge(cov[tpd].loc[str(round(q,2)), :], tdf, how='outer') # Raises AttributeError: 'Series' object has no attribute 'columns'
我理解所有的错误消息,而且我还有一个潜在的修复方法,包括按照我想要的方式预先创建数据帧
cov[tpd]
,然后使用索引插入
cov\u var()的输出。但这是为
cov[tpd]
创建多索引并插入数据所需的几行额外代码。有人知道更好的方法吗



注意:
cov_var()
是我编写的一个简单的协方差计算函数,因为我的情况有点特殊,我不能使用像
np.cov()

这样的内置函数,所以我最终放弃了,并使用了我在上述问题中暗示的方法。实际上,它似乎比我一直坚持尝试的方法要快。一切都很好。以下是我最后做的:

cov = {}
ind_lev_1 = [str(round(q,2)) for q in quantiles]
ind_lev_2 = states
index = pd.MultiIndex.from_product([ind_lev_1, ind_lev_2], names=['QUANTILE', 'STATE'])
columns = pd.Index(ind_lev_2, name='STATE')

for tpd in time_periods:
    cov[tpd] = pd.DataFrame(index=index, columns=columns)
    for q in quantiles:
        q = str(round(q,2))
        cov[tpd].loc[(q,), :] = cov_var(arr=data_for_q, means=pop_means_for_q)

因此,我最终让步了,并使用了我在上述问题中暗示的方法。实际上,它似乎比我一直坚持尝试的方法要快。一切都很好。以下是我最后做的:

cov = {}
ind_lev_1 = [str(round(q,2)) for q in quantiles]
ind_lev_2 = states
index = pd.MultiIndex.from_product([ind_lev_1, ind_lev_2], names=['QUANTILE', 'STATE'])
columns = pd.Index(ind_lev_2, name='STATE')

for tpd in time_periods:
    cov[tpd] = pd.DataFrame(index=index, columns=columns)
    for q in quantiles:
        q = str(round(q,2))
        cov[tpd].loc[(q,), :] = cov_var(arr=data_for_q, means=pop_means_for_q)