Python 使用二级索引复制数据帧中的n行？_Python_Pandas_Rolling Computation

Python 使用二级索引复制数据帧中的n行？

python pandas

Python 使用二级索引复制数据帧中的n行？,python,pandas,rolling-computation,Python,Pandas,Rolling Computation,例如，我有一个熊猫数据框，看起来像这样 df Values Timestamp 2020-02-01 A 2020-02-02 B 2020-02-03 C 我想（为了简化以后的处理）保留一个n行的窗口，并为每个时间戳复制它，并使用本地int索引创建一个二级索引当n=2时，这将给出： df_new Values Timestamp 2nd_level_index 2

例如，我有一个熊猫数据框，看起来像这样

df
            Values
Timestamp
2020-02-01       A
2020-02-02       B
2020-02-03       C

我想（为了简化以后的处理）保留一个n行的窗口，并为每个时间戳复制它，并使用本地int索引创建一个二级索引

当n=2时，这将给出：

df_new
                                Values
Timestamp   2nd_level_index
2020-02-01                0        NaN
                          1          A
2020-02-02                0          A
                          1          B
2020-03-03                0          B
                          1          C

有没有什么内置功能可以帮助我做到这一点？一个固定大小（n）的滚动窗口似乎是开始，但是如何复制窗口并使用第二级索引为每一行存储它呢

提前感谢您的帮助！最好的

编辑04/05

使用propose代码，并稍微更改输出格式，我将其改编为2列数据帧

我最终得到了以下代码

import pandas as pd
import numpy as np
from random import seed, randint

def transpose_n_rows(df: pd.DataFrame, n_rows: int) -> pd.DataFrame:

    array = np.concatenate((np.full((len(df.columns),n_rows-1), np.nan), df.transpose()), axis=1)

    shape = array.shape[:-1] + (array.shape[-1] - n_rows + 1, n_rows)
    strides = array.strides + (array.strides[-1],)
    array = np.lib.stride_tricks.as_strided(array, shape=shape, strides=strides)

    midx = pd.MultiIndex.from_product([df.columns, range(n_rows)], names=['Data','Position'])
    transposed = pd.DataFrame(np.concatenate(array, axis=1), index=df.index, columns=midx)

    return transposed

n = 4
start = '2020-01-01 00:00+00:00'
end = '2020-01-01 12:00+00:00'

pr2h = pd.period_range(start=start, end=end, freq='2h')
seed(1)
values1 = [randint(0,10) for ts in pr2h]
values2 = [randint(20,30) for ts in pr2h]
df2h = pd.DataFrame({'Values1' : values1, 'Values2': values2}, index=pr2h)

df2h_new = transpose_n_rows(df2h, n)

这就给了我们

In [29]:df2h
Out[29]: 
                  Values1  Values2
2020-01-01 00:00        2       27
2020-01-01 02:00        9       30
2020-01-01 04:00        1       26
2020-01-01 06:00        4       23
2020-01-01 08:00        1       21
2020-01-01 10:00        7       27
2020-01-01 12:00        7       20

In [30]:df2h_new
Out[30]: 
Data             Values1                Values2                  
Position               0    1    2    3       0     1     2     3
2020-01-01 00:00     NaN  NaN  NaN  2.0     NaN   NaN   NaN  27.0
2020-01-01 02:00     NaN  NaN  2.0  9.0     NaN   NaN  27.0  30.0
2020-01-01 04:00     NaN  2.0  9.0  1.0     NaN  27.0  30.0  26.0
2020-01-01 06:00     2.0  9.0  1.0  4.0    27.0  30.0  26.0  23.0
2020-01-01 08:00     9.0  1.0  4.0  1.0    30.0  26.0  23.0  21.0
2020-01-01 10:00     1.0  4.0  1.0  7.0    26.0  23.0  21.0  27.0
2020-01-01 12:00     4.0  1.0  7.0  7.0    23.0  21.0  27.0  20.0

然而，我在for循环中为大量数据帧调用这个函数

transpose\u n\u rows

。第一次使用时，我对性能问题有点担心

我可以读到，应该避免多次调用np.concatenate或pd.concat，在这里，我有两个用于可能可以绕过的用途

请问，如果可能的话，有什么建议可以把它们处理掉吗

我提前感谢你的帮助！最好的，我认为熊猫没有内置的方法

生成滚动二维阵列的可能解决方案：

n = 2
#added Nones for first values of 2d array
x = np.concatenate([[None] * (n-1), df['Values']])

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(x, n)
print (a)
[[None 'A']
 ['A' 'B']
 ['B' 'C']]

然后按创建多索引，并按以下方式展平数组值：

如果值为数字，则添加缺少的值：

print (df)
            Values
Timestamp         
2020-02-01       1
2020-02-02       2
2020-02-03       3

A、B、C

是数字吗？或者任何值，可能的字符串？Hello@jezrael A、B、C实际上是数字。Hello@jezrael。非常感谢你的回答。这已经帮助我建立了第一个版本。我对它做了一些修改，以管理具有多个列的数据帧。在这个版本的代码中，我使用了np.concatenate两次，也许我们可以避免。我在编辑过的问题中显示修改过的版本。我担心性能问题，因为这个函数是在for循环中调用的。请问，您有什么建议可以消除对np.concatenate（）的这些调用吗？谢谢你的帮助，贝斯特。

print (df)
            Values
Timestamp         
2020-02-01       1
2020-02-02       2
2020-02-03       3

n = 2
x = np.concatenate([[np.nan] * (n-1), df['Values']])

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(x, n)
print (a)
[[nan  1.]
 [ 1.  2.]
 [ 2.  3.]]

mux = pd.MultiIndex.from_product([df.index, range(n)], names=('times','level1'))
df = pd.DataFrame({'Values': np.ravel(a)}, index=mux)
print (df)

                   Values
times      level1        
2020-02-01 0          NaN
           1          1.0
2020-02-02 0          1.0
           1          2.0
2020-02-03 0          2.0
           1          3.0