Python 移位局部数据帧'；s行，具体取决于特定单元格中的值_Python_Pandas_Dataframe

Python 移位局部数据帧'；s行，具体取决于特定单元格中的值

python pandas dataframe

Python 移位局部数据帧'；s行，具体取决于特定单元格中的值,python,pandas,dataframe,Python,Pandas,Dataframe,假设我们有一个数据帧，其结构如下： df = pd.DataFrame({ 'Year':[2017, 2019, 2018, 2017, 2017, 2017], 'B':[4,5,4,5,5,4], 'C':[0,0,0,0,0,7], 'D':[0,1,3,5,7,1], 'E':[5,3,6,9,2,4], 总体思路是移动每一行，对应于“年”列中的值，2017年为基准年，每一行应在（2017年）单

假设我们有一个数据帧，其结构如下：

df = pd.DataFrame({
         'Year':[2017, 2019, 2018, 2017, 2017, 2017],
         'B':[4,5,4,5,5,4],
         'C':[0,0,0,0,0,7],
         'D':[0,1,3,5,7,1],
         'E':[5,3,6,9,2,4],

总体思路是移动每一行，对应于“年”列中的值，2017年为基准年，每一行应在（2017年）单元格上向右移动，新单元格应以零（0）填充，如：

ps：实际上，我们需要对一些结果行进行成对求和，以便每列的“年份”相同

当我们求0和2行的和时，这只是第一步。那么它应该是1和3，依此类推

因此，也许有一些pandas功能可以帮助您在不预先移位的情况下完成此任务…

如果默认情况下在pandas中使用

shift

，则最后一列将丢失。因此，有必要首先添加由缺失值填充的新列-列数取决于非2017值的差异

df = df.set_index('Year')

diff = np.setdiff1d(df.index.dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]

df = df.assign(**{f'new{x}':np.nan for x in range(max(diff-2017))})

然后您可以在循环中使用

shift

，并在索引中按年份过滤：

for y in diff:
    df.loc[y, :] = df.astype(float).shift(y - 2017, axis=1).loc[y, :]

最后替换缺少的值，强制转换为整数并将索引转换为列：

df = df.fillna(0).astype(int).reset_index()
print (df)
   Year  B  C  D  E  new0  new1
0  2017  4  0  0  5     0     0
1  2019  0  0  5  0     1     3
2  2018  0  4  0  3     6     0
3  2017  5  0  5  9     0     0
4  2017  5  0  7  2     0     0
5  2017  4  7  1  4     0     0

编辑：

另一列的解决方案：

df = pd.DataFrame({
         'new':list('abcdef'),
         'Year':[2017, 2019, 2018, 2017, 2017, 2017],
         'B':[4,5,4,5,5,4],
         'C':[0,0,0,0,0,7],
         'D':[0,1,3,5,7,1],
         'E':[5,3,6,9,2,4]})
print (df)
  new  Year  B  C  D  E
0   a  2017  4  0  0  5
1   b  2019  5  0  1  3
2   c  2018  4  0  3  6
3   d  2017  5  0  5  9
4   e  2017  5  0  7  2
5   f  2017  4  7  1  4

我以编程方式创建了从第一个df帧到最后一个df帧的步骤。我这样做是因为您可能正在寻找如何以编程方式实现这一点，并且它可能有助于最终结果。稍微了解一下，我可能会使这个过程更容易：

import pandas as pd
import numpy as np
df = pd.DataFrame({
         'Year':[2017, 2019, 2018, 2017, 2017, 2017],
         'B':[4,5,4,5,5,4],
         'C':[0,0,0,0,0,7],
         'D':[0,1,3,5,7,1],
         'E':[5,3,6,9,2,4],})

df.insert(column='F',loc=len(df)-1,value=np.zeros(len(df),dtype=int)) 
df.insert(column='G',loc=len(df)-1,value=np.zeros(len(df),dtype=int)) 
df1 = df.T
cols =df1.iloc[0]
df1.columns = cols
df1.drop('Year', inplace=True)
df1.iloc[0:, [1]] =  np.roll(df1.iloc[0:, [1]], shift=2)
df1.iloc[0:, [2]] =  np.roll(df1.iloc[0:, [2]], shift=1)

df = df1.T.reset_index() 
res = df.iloc[2] + df.iloc[0]
df = df.append(res, ignore_index=True)
df['Year'][6]= 'res'

输出：

   Year  B  C  D  E  G  F
0  2017  4  0  0  5  0  0
1  2019  0  0  5  0  1  3
2  2018  0  4  0  3  6  0
3  2017  5  0  5  9  0  0
4  2017  5  0  7  2  0  0
5  2017  4  7  1  4  0  0
6   res  4  4  0  8  6  0

_实际上，我们需要对一些结果行进行成对求和，这样每个列的“年”都是相同的，您能更具体一些吗？什么决定了哪些行应该求和？您的最终数据帧看起来如何？lilke？@Alexander Cécile很抱歉让您困惑。因此，我们必须对第0行和第2行、第1行和第3行求和，依此类推。我们需要改变它，因为例如，单元格df.iloc[0,1]与2017年相关，但df.iloc[1,1]与2019年相关，因此我们无法对其求和。我仍然不确定我是否理解。你是如何处理存在多次出现的年份这一事实的？@Alexander Cécile你的意思是“2017年”？我们不需要更改“年”列中包含“2017”的行。

实际上，完整df中的“年”在[2017，2035]范围内。

for y in diff:
    idx = pd.IndexSlice
    df.loc[idx[:, y], :] = df.astype(float).shift(y - 2017, axis=1).loc[idx[:, y], :]

df = df.fillna(0).astype(int).reset_index()
print (df)
  new  Year  B  C  D  E  new0  new1
0   a  2017  4  0  0  5     0     0
1   b  2019  0  0  5  0     1     3
2   c  2018  0  4  0  3     6     0
3   d  2017  5  0  5  9     0     0
4   e  2017  5  0  7  2     0     0
5   f  2017  4  7  1  4     0     0

import pandas as pd
import numpy as np
df = pd.DataFrame({
         'Year':[2017, 2019, 2018, 2017, 2017, 2017],
         'B':[4,5,4,5,5,4],
         'C':[0,0,0,0,0,7],
         'D':[0,1,3,5,7,1],
         'E':[5,3,6,9,2,4],})

df.insert(column='F',loc=len(df)-1,value=np.zeros(len(df),dtype=int)) 
df.insert(column='G',loc=len(df)-1,value=np.zeros(len(df),dtype=int)) 
df1 = df.T
cols =df1.iloc[0]
df1.columns = cols
df1.drop('Year', inplace=True)
df1.iloc[0:, [1]] =  np.roll(df1.iloc[0:, [1]], shift=2)
df1.iloc[0:, [2]] =  np.roll(df1.iloc[0:, [2]], shift=1)

df = df1.T.reset_index() 
res = df.iloc[2] + df.iloc[0]
df = df.append(res, ignore_index=True)
df['Year'][6]= 'res'

   Year  B  C  D  E  G  F
0  2017  4  0  0  5  0  0
1  2019  0  0  5  0  1  3
2  2018  0  4  0  3  6  0
3  2017  5  0  5  9  0  0
4  2017  5  0  7  2  0  0
5  2017  4  7  1  4  0  0
6   res  4  4  0  8  6  0