Python 熊猫—；使用多索引上的部分切片设置值_Python_Pandas_Indexing

Python 熊猫—；使用多索引上的部分切片设置值

python pandas indexing

Python 熊猫—；使用多索引上的部分切片设置值,python,pandas,indexing,Python,Pandas,Indexing,我有一段代码，它生成以下空数据帧： >>> first = ['foo', 'bar'] >>> second = ['baz', 'can'] >>> third = ['ok', 'ko'] >>> colours = ['blue', 'yellow', 'green'] >>> idx = pd.IndexSlice >>> ix = pd.MultiIndex.from_arr

我有一段代码，它生成以下空数据帧：

>>> first = ['foo', 'bar']
>>> second = ['baz', 'can']
>>> third = ['ok', 'ko']
>>> colours = ['blue', 'yellow', 'green']

>>> idx = pd.IndexSlice
>>> ix = pd.MultiIndex.from_arrays(np.array([i for i in itertools.product(first, second, third)]).transpose().tolist(),
                                   names=('first', 'second', 'third'))
>>> df1 = pd.DataFrame(index=ix, columns=colours).sort_index()
>>> print(df1)

                   blue yellow green
first second third                  
bar   baz    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN
      can    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN
foo   baz    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN
      can    ko     NaN    NaN   NaN
             ok     NaN    NaN   NaN

我打算做的是从另一个给定的、基于列的数据帧填充这个基于多索引的空数据帧，如下所示（为了清楚起见，列被截断）：

到目前为止，我一直在这样尝试：

idx = pd.IndexSlice
for s in second:
    for t in third:
        for c in colours:
            column_name = '{s}_{t}_{c}'.format(s=s, c=c, t=t)
            values = df2[column_name]
            df1.loc[idx[:, s, t], c] = values

在每次迭代中，正确确定了

值

系列，但是熊猫没有将

值

的索引与df1的多索引的第一级匹配。因此，所有df1值都保持

NaN

，因为Pandas正在尝试将多索引与单个索引匹配。有办法解决这个问题吗

基本上，为了提供更高层次的透视图，我只是尝试将df2（基于字符串列）重新排列为df1（基于多索引）的形式。

您可以先创建

多索引，然后再重塑，最后：
谢谢，太棒了。然而，似乎有些值在最后阶段丢失了（它们在叠加后仍然存在，但在重新索引后消失-大多数最终为NaN
）是否可以模拟它？这可能是因为事实上，列second
和color可以有相同的标签-我现在要尝试更改它，效果非常好！非常感谢你的帮助help@Jivan-苏佩尔，天气真好！
idx = pd.IndexSlice
for s in second:
    for t in third:
        for c in colours:
            column_name = '{s}_{t}_{c}'.format(s=s, c=c, t=t)
            values = df2[column_name]
            df1.loc[idx[:, s, t], c] = values

df.columns = df.columns.str.split('_', expand=True)
print (df)
          baz                 can                 baz
           ok        ko        ok        ko        ok
         blue      blue      blue      blue    yellow
foo -1.385111 -1.014812 -1.419643  1.540341  0.663933
bar  0.445372 -0.226087  0.450982 -1.114169  0.896522

df = df.stack([0,1]).reindex(index=df1.index, columns=df1.columns)
print (df)
                        blue    yellow  green
first second third                           
bar   baz    ko    -0.226087       NaN    NaN
             ok     0.445372  0.896522    NaN
      can    ko    -1.114169       NaN    NaN
             ok     0.450982       NaN    NaN
foo   baz    ko    -1.014812       NaN    NaN
             ok    -1.385111  0.663933    NaN
      can    ko     1.540341       NaN    NaN
             ok    -1.419643       NaN    NaN