Python 基于表中一列的值从行创建列

Python 基于表中一列的值从行创建列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧: PERIOD_START_TIME ID temp_ID value1 value2 06.28.2017 22:00:00 88 1 4 2 06.28.2017 22:00:00 88 2 0 7 06.28.2017 22:00:00 89 2 0 9 06.28.2017 22:00:00 89

我有一个如下所示的数据帧:

PERIOD_START_TIME       ID    temp_ID  value1  value2
06.28.2017 22:00:00     88      1        4       2
06.28.2017 22:00:00     88      2        0       7
06.28.2017 22:00:00     89      2        0       9
06.28.2017 22:00:00     89      1        5       4
06.28.2017 22:00:00     90      1        12      13
06.28.2017 22:00:00     90      2        18      4
现在我需要去掉一半的行,但得到两倍多的列。实际上,双击列并将temp_ID指定给列的名称。简单地说,temp_id从行转换为列

期望输出

PERIOD_START_TIME    ID  value1_tpID1 vauel1_tpID2  vauel2_tpID1 value2_tpID2
06.28.2017 22:00:00  88          4       0            2            7
06.28.2017 22:00:00  89          5       0            4            9
06.28.2017 22:00:00  90          12      18           13           4

<class 'pandas.core.frame.DataFrame'>
Int64Index: 189604 entries, 0 to 10595
Data columns (total 12 columns):
PERIOD_START_TIME         189604 non-null object
ID                       189604 non-null int64
temp_ID                  189604 non-null int64
dtypes: float64(4), int64(6), object(2)
memory usage: 18.8+ MB
时段\u开始\u时间ID值1\u tpID1 vauel1\u tpID2 vauel2\u tpID1值2\u tpID2
06.28.2017 22:00:00  88          4       0            2            7
06.28.2017 22:00:00  89          5       0            4            9
06.28.2017 22:00:00  90          12      18           13           4
INT64索引:189604个条目,0到10595
数据列(共12列):
时段\开始\时间189604非空对象
ID 189604非空int64
临时ID 189604非空int64
数据类型:float64(4)、int64(6)、object(2)
内存使用率:18.8+MB
您可以使用:

或:

如果在三个时间段内重复
PERIOD\u START\u TIME
ID
temp\u ID
,则需要一些聚合函数,如
mean
sum

print (df)
     PERIOD_START_TIME  ID  temp_ID  value1  value2
0  06.28.2017 22:00:00  88        1       4       2 < same PERIOD_START_TIME  ID  temp_ID
1  06.28.2017 22:00:00  88        1       5       3 < same PERIOD_START_TIME  ID  temp_ID
2  06.28.2017 22:00:00  88        2       0       7
3  06.28.2017 22:00:00  89        2       0       9
4  06.28.2017 22:00:00  89        1       5       4
5  06.28.2017 22:00:00  90        1      12      13
6  06.28.2017 22:00:00  90        2      18       4

df = df.pivot_table(index=['PERIOD_START_TIME','ID'], 
                    columns='temp_ID', 
                    values=['value1','value2'],
                    aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88       4.5       0.0       2.5       7.0
1  06.28.2017 22:00:00  89       5.0       0.0       4.0       9.0
2  06.28.2017 22:00:00  90      12.0      18.0      13.0       4.0

我在每种情况下都有错误,例如aggrfunc:索引0超出轴0的范围,大小为0谢谢,我从来没有遇到过问题,什么是
print(df.info())
?每列都是一个floatHmm,还有什么其他错误?我认为
浮动
不是问题。索引0超出了大小为0的轴0的界限,最后一个是:“周期\开始\时间”
df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88         4         0         2         7
1  06.28.2017 22:00:00  89         5         0         4         9
2  06.28.2017 22:00:00  90        12        18        13         4
print (df)
     PERIOD_START_TIME  ID  temp_ID  value1  value2
0  06.28.2017 22:00:00  88        1       4       2 < same PERIOD_START_TIME  ID  temp_ID
1  06.28.2017 22:00:00  88        1       5       3 < same PERIOD_START_TIME  ID  temp_ID
2  06.28.2017 22:00:00  88        2       0       7
3  06.28.2017 22:00:00  89        2       0       9
4  06.28.2017 22:00:00  89        1       5       4
5  06.28.2017 22:00:00  90        1      12      13
6  06.28.2017 22:00:00  90        2      18       4

df = df.pivot_table(index=['PERIOD_START_TIME','ID'], 
                    columns='temp_ID', 
                    values=['value1','value2'],
                    aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88       4.5       0.0       2.5       7.0
1  06.28.2017 22:00:00  89       5.0       0.0       4.0       9.0
2  06.28.2017 22:00:00  90      12.0      18.0      13.0       4.0
df = df.groupby(['PERIOD_START_TIME','ID','temp_ID']).mean().unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88       4.5       0.0       2.5       7.0
1  06.28.2017 22:00:00  89       5.0       0.0       4.0       9.0
2  06.28.2017 22:00:00  90      12.0      18.0      13.0       4.0