Python 3.x 对数据透视后要升序的列名进行排序
我有一个Spark数据框,看起来像这样:Python 3.x 对数据透视后要升序的列名进行排序,python-3.x,pandas,Python 3.x,Pandas,我有一个Spark数据框,看起来像这样: +----+-----+-------------+---+ |year|month|feature |cnt| +----+-----+-------------+---+ |2019|2 |Feature1 |2 | |2019|2 |Feature2 |5 | |2019|2 |Feature3 |54 | |2019|2 |Feature4 |75 | |2019|2 |
+----+-----+-------------+---+
|year|month|feature |cnt|
+----+-----+-------------+---+
|2019|2 |Feature1 |2 |
|2019|2 |Feature2 |5 |
|2019|2 |Feature3 |54 |
|2019|2 |Feature4 |75 |
|2019|2 |... |1 |
|2019|2 |... |85 |
|2019|2 |... |77 |
|2019|2 |... |124|
|2019|2 |... |6 |
|2019|2 |... |362|
|2019|2 |... |74 |
|2019|2 |... |10 |
|2019|3 |Feature1 |10 |
|2019|3 |Feature2 |5 |
...
我可以成功地将dataframe转换为Pandas,并将年+月组合透视为列:
monthly_df = monthly_counts.toPandas()
monthly_df['yearM'] = monthly_df['year'].astype(str) + monthly_df['month'].astype(str)
del monthly_df['year']
del monthly_df['month']
monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv
问题是列顺序变得如此(尽管原始数据帧按asc排序):
无论如何,我可以在数据透视表中将列名按asc排序吗?也就是说,
feature
之后的第一列将是20192
,然后是20193
,依此类推。问题在于,您对列的命名方式使它们按错误的字母顺序排序<代码>20191后接201910
,201911
,201912
,然后是20192
。要解决此问题,您可以在一位数的月份中添加一个零:
monthly_df = monthly_counts.toPandas().assign(day=1)
monthly_df['yearM'] = pd.to_datetime(monthly_df[['year','month','day']]).dt.strftime('%Y%m')
del monthly_df['year']
del monthly_df['month']
del monthly_df['day']
monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv
我似乎遇到了一个ValueError,它是这样的:
额外的键被传递到了datetime集合:[cnt,feature]
monthly_df = monthly_counts.toPandas().assign(day=1)
monthly_df['yearM'] = pd.to_datetime(monthly_df[['year','month','day']]).dt.strftime('%Y%m')
del monthly_df['year']
del monthly_df['month']
del monthly_df['day']
monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv