Python 基于表中一列的值从行创建列
我有一个如下所示的数据帧:Python 基于表中一列的值从行创建列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧: PERIOD_START_TIME ID temp_ID value1 value2 06.28.2017 22:00:00 88 1 4 2 06.28.2017 22:00:00 88 2 0 7 06.28.2017 22:00:00 89 2 0 9 06.28.2017 22:00:00 89
PERIOD_START_TIME ID temp_ID value1 value2
06.28.2017 22:00:00 88 1 4 2
06.28.2017 22:00:00 88 2 0 7
06.28.2017 22:00:00 89 2 0 9
06.28.2017 22:00:00 89 1 5 4
06.28.2017 22:00:00 90 1 12 13
06.28.2017 22:00:00 90 2 18 4
现在我需要去掉一半的行,但得到两倍多的列。实际上,双击列并将temp_ID指定给列的名称。简单地说,temp_id从行转换为列
期望输出
PERIOD_START_TIME ID value1_tpID1 vauel1_tpID2 vauel2_tpID1 value2_tpID2
06.28.2017 22:00:00 88 4 0 2 7
06.28.2017 22:00:00 89 5 0 4 9
06.28.2017 22:00:00 90 12 18 13 4
<class 'pandas.core.frame.DataFrame'>
Int64Index: 189604 entries, 0 to 10595
Data columns (total 12 columns):
PERIOD_START_TIME 189604 non-null object
ID 189604 non-null int64
temp_ID 189604 non-null int64
dtypes: float64(4), int64(6), object(2)
memory usage: 18.8+ MB
时段\u开始\u时间ID值1\u tpID1 vauel1\u tpID2 vauel2\u tpID1值2\u tpID2
06.28.2017 22:00:00 88 4 0 2 7
06.28.2017 22:00:00 89 5 0 4 9
06.28.2017 22:00:00 90 12 18 13 4
INT64索引:189604个条目,0到10595
数据列(共12列):
时段\开始\时间189604非空对象
ID 189604非空int64
临时ID 189604非空int64
数据类型:float64(4)、int64(6)、object(2)
内存使用率:18.8+MB
您可以使用:
或:
如果在三个时间段内重复PERIOD\u START\u TIME
,ID
,temp\u ID
,则需要一些聚合函数,如mean
,sum
:
print (df)
PERIOD_START_TIME ID temp_ID value1 value2
0 06.28.2017 22:00:00 88 1 4 2 < same PERIOD_START_TIME ID temp_ID
1 06.28.2017 22:00:00 88 1 5 3 < same PERIOD_START_TIME ID temp_ID
2 06.28.2017 22:00:00 88 2 0 7
3 06.28.2017 22:00:00 89 2 0 9
4 06.28.2017 22:00:00 89 1 5 4
5 06.28.2017 22:00:00 90 1 12 13
6 06.28.2017 22:00:00 90 2 18 4
df = df.pivot_table(index=['PERIOD_START_TIME','ID'],
columns='temp_ID',
values=['value1','value2'],
aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0
我在每种情况下都有错误,例如aggrfunc:索引0超出轴0的范围,大小为0谢谢,我从来没有遇到过问题,什么是
print(df.info())
?每列都是一个floatHmm,还有什么其他错误?我认为浮动
不是问题。索引0超出了大小为0的轴0的界限,最后一个是:“周期\开始\时间”
df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4 0 2 7
1 06.28.2017 22:00:00 89 5 0 4 9
2 06.28.2017 22:00:00 90 12 18 13 4
print (df)
PERIOD_START_TIME ID temp_ID value1 value2
0 06.28.2017 22:00:00 88 1 4 2 < same PERIOD_START_TIME ID temp_ID
1 06.28.2017 22:00:00 88 1 5 3 < same PERIOD_START_TIME ID temp_ID
2 06.28.2017 22:00:00 88 2 0 7
3 06.28.2017 22:00:00 89 2 0 9
4 06.28.2017 22:00:00 89 1 5 4
5 06.28.2017 22:00:00 90 1 12 13
6 06.28.2017 22:00:00 90 2 18 4
df = df.pivot_table(index=['PERIOD_START_TIME','ID'],
columns='temp_ID',
values=['value1','value2'],
aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0
df = df.groupby(['PERIOD_START_TIME','ID','temp_ID']).mean().unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0