Python 3.x 对数据透视后要升序的列名进行排序_Python 3.x_Pandas

Python 3.x 对数据透视后要升序的列名进行排序

python-3.x pandas

Python 3.x 对数据透视后要升序的列名进行排序,python-3.x,pandas,Python 3.x,Pandas,我有一个Spark数据框，看起来像这样： +----+-----+-------------+---+ |year|month|feature |cnt| +----+-----+-------------+---+ |2019|2 |Feature1 |2 | |2019|2 |Feature2 |5 | |2019|2 |Feature3 |54 | |2019|2 |Feature4 |75 | |2019|2 |

我有一个Spark数据框，看起来像这样：

+----+-----+-------------+---+
|year|month|feature      |cnt|
+----+-----+-------------+---+
|2019|2    |Feature1     |2  |
|2019|2    |Feature2     |5  |
|2019|2    |Feature3     |54 |
|2019|2    |Feature4     |75 |
|2019|2    |...          |1  |
|2019|2    |...          |85 |
|2019|2    |...          |77 |
|2019|2    |...          |124|
|2019|2    |...          |6  |
|2019|2    |...          |362|
|2019|2    |...          |74 |
|2019|2    |...          |10 |
|2019|3    |Feature1     |10 |
|2019|3    |Feature2     |5  | 
...

我可以成功地将dataframe转换为Pandas，并将年+月组合透视为列：

monthly_df = monthly_counts.toPandas()
monthly_df['yearM'] = monthly_df['year'].astype(str) + monthly_df['month'].astype(str)
del monthly_df['year']
del monthly_df['month']

monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv

问题是列顺序变得如此（尽管原始数据帧按asc排序）：

无论如何，我可以在数据透视表中将列名按asc排序吗？也就是说，

feature

之后的第一列将是

，然后是

，依此类推。

问题在于，您对列的命名方式使它们按错误的字母顺序排序<代码>20191后接

，

，然后是

。要解决此问题，您可以在一位数的月份中添加一个零：

monthly_df = monthly_counts.toPandas().assign(day=1)
monthly_df['yearM'] = pd.to_datetime(monthly_df[['year','month','day']]).dt.strftime('%Y%m')
del monthly_df['year']
del monthly_df['month']
del monthly_df['day']

monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv

我似乎遇到了一个ValueError，它是这样的：

额外的键被传递到了datetime集合：[cnt，feature]

monthly_df = monthly_counts.toPandas().assign(day=1)
monthly_df['yearM'] = pd.to_datetime(monthly_df[['year','month','day']]).dt.strftime('%Y%m')
del monthly_df['year']
del monthly_df['month']
del monthly_df['day']

monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv