Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Arrays 基于前面的n行在groupby()中创建新列的较短方法_Arrays_Pandas_Dataframe_Group By_Shift - Fatal编程技术网

Arrays 基于前面的n行在groupby()中创建新列的较短方法

Arrays 基于前面的n行在groupby()中创建新列的较短方法,arrays,pandas,dataframe,group-by,shift,Arrays,Pandas,Dataframe,Group By,Shift,我有以下代码,对于已排序的Pandas数据帧,按一列分组,并创建两个新列:一个根据组中的前4行和当前行,另一个基于组中的未来行 data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6], 'val':[11,12,13,14,15,61,62,63,64,65,66,67]} df_test = pd.DataFrame (data_test, columns = ['nr','val']) print (df_test) df_tes

我有以下代码,对于已排序的Pandas数据帧,按一列分组,并创建两个新列:一个根据组中的前4行和当前行,另一个基于组中的未来行

data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6],
            'val':[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test = pd.DataFrame (data_test, columns = ['nr','val'])

print (df_test) 
df_test['past4'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(4).fillna(0))
df_test['past3'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(3).fillna(0))
df_test['past2'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(2).fillna(0))
df_test['past1'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(1).fillna(0))
df_test['future'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(-1).fillna(0))
df_test['amounts'] = df_test[['past4', 'past3','past2','past1','val']].values.tolist()
df_test.drop(columns = ['past4', 'past3', 'past2', 'past1'], inplace = True)
df_test

    nr  val future  amounts
0   1   11  12  [0, 0, 0, 0, 11]
1   1   12  13  [0, 0, 0, 11, 12]
2   1   13  14  [0, 0, 11, 12, 13]
3   1   14  15  [0, 11, 12, 13, 14]
4   1   15  0   [11, 12, 13, 14, 15]
5   6   61  62  [0, 0, 0, 0, 61]
6   6   62  63  [0, 0, 0, 61, 62]
7   6   63  64  [0, 0, 61, 62, 63]
8   6   64  65  [0, 61, 62, 63, 64]
9   6   65  66  [61, 62, 63, 64, 65]
10  6   66  67  [62, 63, 64, 65, 66]
11  6   67  0   [63, 64, 65, 66, 67]
因此,以下框架:

   nr  val
0    1   11
1    1   12
2    1   13
3    1   14
4    1   15
5    6   61
6    6   62
7    6   63
8    6   64
9    6   65
10   6   66
11   6   67
现在,我必须按照下面的代码按“nr”分组,并为每行构建一列,其中包含组中“val”的前4个值和当前值。类似地,构建一个额外的列,每行包含组中“val”的未来值

data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6],
            'val':[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test = pd.DataFrame (data_test, columns = ['nr','val'])

print (df_test) 
df_test['past4'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(4).fillna(0))
df_test['past3'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(3).fillna(0))
df_test['past2'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(2).fillna(0))
df_test['past1'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(1).fillna(0))
df_test['future'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(-1).fillna(0))
df_test['amounts'] = df_test[['past4', 'past3','past2','past1','val']].values.tolist()
df_test.drop(columns = ['past4', 'past3', 'past2', 'past1'], inplace = True)
df_test

    nr  val future  amounts
0   1   11  12  [0, 0, 0, 0, 11]
1   1   12  13  [0, 0, 0, 11, 12]
2   1   13  14  [0, 0, 11, 12, 13]
3   1   14  15  [0, 11, 12, 13, 14]
4   1   15  0   [11, 12, 13, 14, 15]
5   6   61  62  [0, 0, 0, 0, 61]
6   6   62  63  [0, 0, 0, 61, 62]
7   6   63  64  [0, 0, 61, 62, 63]
8   6   64  65  [0, 61, 62, 63, 64]
9   6   65  66  [61, 62, 63, 64, 65]
10  6   66  67  [62, 63, 64, 65, 66]
11  6   67  0   [63, 64, 65, 66, 67]

我相信我应该能够更容易地构建一个名为“金额”的列表列,可能是一行。如何做到这一点?

将bloc迁移到函数中会使代码更加模块化和轻巧

在此特定示例中,我们将
反向(范围(5))
作为
shift\u值发送,这表示列表
[4,3,2,1,0]

将熊猫作为pd导入
数据检验={'nr':[1,1,1,1,6,6,6,6,6],
“val”:[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test=pd.DataFrame(数据_test,列=['nr','val'])
def生成_过去(df、shift_值):
serie=pd.DataFrame([df.groupby('nr')['val'].transform(lambda x:x.shift(shift_值).shift_值中的shift_值的fillna(0)))
返回serie.T.values.tolist()
df_test['future']=df_test.groupby(['nr'])['val'].transform(lambda x:x.shift(-1).fillna(0))
df_测试['amounts']=生成_过去(df_测试,反转(范围(5)))

使用自定义函数创建嵌套列表,如:

def f(x):
    #list comprehension with shift by 4,3,2,1,0
    L = [x['val'].shift(i).fillna(0) for i in range(4, -1, -1)]
    #shifting to another column
    x['future'] = x['val'].shift(-1).fillna(0).astype(int)
    #column filled by lists
    x['amounts'] = pd.Series(np.array(L).astype(int).T.tolist(), index=x.index)
    return (x)

df_test = df_test.groupby(['nr']).apply(f)
print (df_test)
    nr  val  future               amounts
0    1   11      12      [0, 0, 0, 0, 11]
1    1   12      13     [0, 0, 0, 11, 12]
2    1   13      14    [0, 0, 11, 12, 13]
3    1   14      15   [0, 11, 12, 13, 14]
4    1   15       0  [11, 12, 13, 14, 15]
5    6   61      62      [0, 0, 0, 0, 61]
6    6   62      63     [0, 0, 0, 61, 62]
7    6   63      64    [0, 0, 61, 62, 63]
8    6   64      65   [0, 61, 62, 63, 64]
9    6   65      66  [61, 62, 63, 64, 65]
10   6   66      67  [62, 63, 64, 65, 66]
11   6   67       0  [63, 64, 65, 66, 67]
您可以这样尝试(与jezrael相同),但不使用apply。这不是一个好方法,因为我正在制作新的数据帧

df_new = pd.DataFrame()
for i,grp in df_test.groupby('nr'):
    grp = grp.reset_index(drop=True)
    grp['future'] = pd.Series(grp['val'].shift(-1).fillna(0).astype(int))
    grp['amount'] = pd.Series([grp['val'].shift(i).fillna(0).values[-5:] for i in range(len(grp)-1,-1,-1)])
    df_new = df_new.append(grp)   
df_new.reset_index(drop=True, inplace=True)
df\u新建:

    nr  val future  amounts
0   1   11  12  [0.0, 0.0, 0.0, 0.0, 11.0]
1   1   12  13  [0.0, 0.0, 0.0, 11.0, 12.0]
2   1   13  14  [0.0, 0.0, 11.0, 12.0, 13.0]
3   1   14  15  [0.0, 11.0, 12.0, 13.0, 14.0]
4   1   15  0   [11, 12, 13, 14, 15]
5   6   61  62  [0.0, 0.0, 0.0, 0.0, 61.0]
6   6   62  63  [0.0, 0.0, 0.0, 61.0, 62.0]
7   6   63  64  [0.0, 0.0, 61.0, 62.0, 63.0]
8   6   64  65  [0.0, 61.0, 62.0, 63.0, 64.0]
9   6   65  66  [61.0, 62.0, 63.0, 64.0, 65.0]
10  6   66  67  [62.0, 63.0, 64.0, 65.0, 66.0]
11  6   67  0   [63, 64, 65, 66, 67]

回答得很好,我正在考虑使用
索引。重复
重新索引(4)
来创建一个新的df,并通过每个唯一的
nr
value
生成两个数据帧,但这更简洁,可能也更节省内存。