Python行到时间序列列_Python_Pandas_Dataframe

Python行到时间序列列

python pandas dataframe

Python行到时间序列列,python,pandas,dataframe,Python,Pandas,Dataframe,我一直在分析PGA巡回赛的数据。出于机器学习的目的，我希望列数据能够表示几周内的统计数据。下面是原始数据结构的示例 import pandas as pd import numpy as np data = {'Player Name':['Tiger','Tiger','Tiger','Tiger','Tiger','Tiger','Jack', 'Jack','Jack','Jack','Jack','Jack','Jack'],

我一直在分析PGA巡回赛的数据。出于机器学习的目的，我希望列数据能够表示几周内的统计数据。下面是原始数据结构的示例

import pandas as pd
import numpy as np

data = {'Player Name':['Tiger','Tiger','Tiger','Tiger','Tiger','Tiger','Jack',
                       'Jack','Jack','Jack','Jack','Jack','Jack'], 
        'Date':[1, 2, 4, 6, 7, 9, 1, 3, 4, 6, 9, 10, 11],
        'SG Total':[13, 2, 14, 6, 8, 1, 1, 3, 8, 4, 9, 2, 1]}

df_original = pd.DataFrame(data)

我想获得以下格式的数据

data = {'Player Name':['Tiger','Tiger','Tiger','Jack','Jack',
                   'Jack','Jack'], 
    'Date':[6, 7, 9, 6, 9, 10, 11],
    'SG Total (Date t-3)':[13, 2, 14, 1, 3, 8, 4],
    'SG Total (Date t-2)':[2, 14, 6, 3, 8, 4, 9],
    'SG Total (Date t-1)':[14, 6, 8, 8, 4, 9, 2],
    'SG Total (Date y)':  [6, 8, 1, 4, 9, 2, 1]}
df_correct = pd.DataFrame(data)

在我使用的真实数据集中，我有大约1000列。因此，新的所需数据集可能有4000列。正如您在所需的数据集中所看到的，我删除了每个玩家的前3周。由于我使用前3周的数据填写（t-3）、（t-2）和（t-1），因此我从个人数据的第4周开始计算日期

我最初为每个星期创建了一个数据集，不管玩家是否玩过，并使用此代码创建了所需的数据帧

#%% Creates weekly dataframes & predictions dataframes

#Creates dataframes of each week
dict_of_weeks = {}

for i in range(1,df_numeric_combined['Date'].nunique()+1):
    dict_of_weeks['Week_{}_df'.format(i)] = df_numeric_combined[df_numeric_combined['Date'] == i]
    dict_of_weeks['Week_{}_df'.format(i)].columns += ' (Week ' + str(i) + ')'
    dict_of_weeks['Week_{}_df'.format(i)].rename(columns={'Player Name (Week ' + str(i) + ')' : 'Player Name' , 'Date (Week ' + str(i) + ')' : 'Date'},inplace=True)


#Creating dataframes for prediction of each week
import functools

dict_of_predictions = {}

df_weeks = []

for i in range(4,df_numeric_combined['Date'].nunique()+1):
    dfs = [dict_of_weeks['Week_'+str(i-3)+'_df'], dict_of_weeks['Week_'+str(i-2)+'_df'], dict_of_weeks['Week_'+str(i-1)+'_df'], dict_of_weeks['Week_'+str(i)+'_df']]

    dict_of_predictions['Week_{}_predictions'.format(i)] = functools.reduce(lambda left,right: pd.merge(left,right,on=['Player Name'], how='outer'), dfs)

    cols = []
    count = 1
    for column in dict_of_predictions['Week_{}_predictions'.format(i)].columns:
        if column == 'Date_y':
            cols.append('Date_y_'+ str(count))
            count+=1
            continue
        cols.append(column)

    dict_of_predictions['Week_{}_predictions'.format(i)].columns = cols

    dict_of_predictions['Week_{}_predictions'.format(i)].drop(columns = ['Date_x', 'Date_y_1'],inplace = True)

    dict_of_predictions['Week_{}_predictions'.format(i)].rename(columns={'Date_y_2':'Date'},inplace=True)

    dict_of_predictions['Week_{}_predictions'.format(i)].columns = dict_of_predictions['Week_{}_predictions'.format(i)].columns.str.replace('(Week ' + str(i-3)+ ')', 'Week t-3').str.replace('(Week ' + str(i-2)+ ')', 'Week t-2').str.replace('(Week ' + str(i-1)+ ')', 'Week t-1').str.replace('(Week ' + str(i)+ ')', 'Week y')

    df_weeks.append(dict_of_predictions['Week_{}_predictions'.format(i)])

#Combines predictions dataframes
df = pd.concat(dict_of_predictions.values(), axis=0, join='inner')

然而，我创建的这段代码只有在玩家连续玩了几周时才有效，因为它依赖于周数，并减去3、2和1

最终目标是获得df_正确格式的数据

谢谢

如果我正确理解您的要求，您可以使用

groupby

在排序数据框中使用

shift

，为每位玩家完成

前一周的结果：

##首先按玩家和日期排序
df_corrected=df_original.sort_值（['Player Name'，'Date']））
您的_列=['SG Total']##在此处列出您的4000列
对于_列中的列：
对于[3,2,1,0]中的s:#####时间流逝
df_corrected[f'{col}（日期t-{s}）]=df_corrected.groupby（'Player Name'）[col].shift（s）
df_corrected.drop（您的_列，axis=1，inplace=True）

哪个输出
Out[12]：
球员姓名日期SG总计（日期t-3）SG总计（日期t-2）\
6杰克1楠楠
7杰克3楠楠
8 Jack 4 NaN 1.0
9插孔6 1.0 3.0
10插孔9 3.0 8.0
11插孔10 8.0 4.0
12插孔11 4.0 9.0
0老虎1楠楠
1虎2楠楠
2虎4南13.0
3老虎613.02.0
4老虎7 2.0 14.0
5老虎914.06.0
SG总计（日期t-1）SG总计（日期t-0）
6南1
7                   1.0                    3  
8                   3.0                    8  
9                   8.0                    4  
10                  4.0                    9  
11                  9.0                    2  
12                  2.0                    1  
0南13
1                  13.0                    2  
2                   2.0                   14  
3                  14.0                    6  
4                   6.0                    8  
5                   8.0                    1