Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 对数据透视后要升序的列名进行排序_Python 3.x_Pandas - Fatal编程技术网

Python 3.x 对数据透视后要升序的列名进行排序

Python 3.x 对数据透视后要升序的列名进行排序,python-3.x,pandas,Python 3.x,Pandas,我有一个Spark数据框,看起来像这样: +----+-----+-------------+---+ |year|month|feature |cnt| +----+-----+-------------+---+ |2019|2 |Feature1 |2 | |2019|2 |Feature2 |5 | |2019|2 |Feature3 |54 | |2019|2 |Feature4 |75 | |2019|2 |

我有一个Spark数据框,看起来像这样:

+----+-----+-------------+---+
|year|month|feature      |cnt|
+----+-----+-------------+---+
|2019|2    |Feature1     |2  |
|2019|2    |Feature2     |5  |
|2019|2    |Feature3     |54 |
|2019|2    |Feature4     |75 |
|2019|2    |...          |1  |
|2019|2    |...          |85 |
|2019|2    |...          |77 |
|2019|2    |...          |124|
|2019|2    |...          |6  |
|2019|2    |...          |362|
|2019|2    |...          |74 |
|2019|2    |...          |10 |
|2019|3    |Feature1     |10 |
|2019|3    |Feature2     |5  | 
...
我可以成功地将dataframe转换为Pandas,并将年+月组合透视为列:

monthly_df = monthly_counts.toPandas()
monthly_df['yearM'] = monthly_df['year'].astype(str) + monthly_df['month'].astype(str)
del monthly_df['year']
del monthly_df['month']

monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv
问题是列顺序变得如此(尽管原始数据帧按asc排序):


无论如何,我可以在数据透视表中将列名按asc排序吗?也就是说,
feature
之后的第一列将是
20192
,然后是
20193
,依此类推。

问题在于,您对列的命名方式使它们按错误的字母顺序排序<代码>20191后接
201910
201911
201912
,然后是
20192
。要解决此问题,您可以在一位数的月份中添加一个零:

monthly_df = monthly_counts.toPandas().assign(day=1)
monthly_df['yearM'] = pd.to_datetime(monthly_df[['year','month','day']]).dt.strftime('%Y%m')
del monthly_df['year']
del monthly_df['month']
del monthly_df['day']

monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv

我似乎遇到了一个ValueError,它是这样的:
额外的键被传递到了datetime集合:[cnt,feature]
monthly_df = monthly_counts.toPandas().assign(day=1)
monthly_df['yearM'] = pd.to_datetime(monthly_df[['year','month','day']]).dt.strftime('%Y%m')
del monthly_df['year']
del monthly_df['month']
del monthly_df['day']

monthly_pv = pd.pivot_table(monthly_df, values = 'cnt', index=['feature'], columns='yearM').reset_index()
monthly_pv