Python 熊猫：如何分组/支点保留南部？将float转换为str，然后再转换回float是可行的，但似乎很复杂_Python_Pandas_Group By_Nan

Python 熊猫：如何分组/支点保留南部？将float转换为str，然后再转换回float是可行的，但似乎很复杂

python pandas

Python 熊猫：如何分组/支点保留南部？将float转换为str，然后再转换回float是可行的，但似乎很复杂,python,pandas,group-by,nan,Python,Pandas,Group By,Nan,我正在跟踪某个事件发生的月份。如果没有，则“月”字段为NaN。起始表如下所示： +-------+----------+---------+ | Month | Category | Balance | +-------+----------+---------+ | 1 | a | 100 | | nan | a | 300 | | 2 | a | 200 | +-------+----------+----

我正在跟踪某个事件发生的月份。如果没有，则“月”字段为NaN。起始表如下所示：

+-------+----------+---------+
| Month | Category | Balance |
+-------+----------+---------+
| 1     | a        |     100 |
| nan   | a        |     300 |
| 2     | a        |     200 |
+-------+----------+---------+

+-------+----------------------------------+
| Month | Category a - cumulative % amount |
+-------+----------------------------------+
|     1 |                             0.16 |
|     2 |                             0.50 |
+-------+----------------------------------+

我正试图构建一个如下交叉表：

+-------+----------+---------+
| Month | Category | Balance |
+-------+----------+---------+
| 1     | a        |     100 |
| nan   | a        |     300 |
| 2     | a        |     200 |
+-------+----------+---------+

+-------+----------------------------------+
| Month | Category a - cumulative % amount |
+-------+----------------------------------+
|     1 |                             0.16 |
|     2 |                             0.50 |
+-------+----------------------------------+

在第1个月，事件发生率为100/600，即16% 在第2个月，事件累计发生（100+200）/600=50%，其中100个在第1个月，200个在第2个月

我的问题是南斯。Pandas会自动从任何groupby/pivot/crosstab中删除NAN。我可以将month字段转换为string，这样对它进行分组就不会删除nan，但是pandas会按月份进行排序，就像它是一个字符串一样，即排序：10、48、5、6

有什么建议吗

下面的一位可以工作，但看起来非常复杂：

我将“月”转换为字符串
做一个交叉表
将月份转换回浮动（我可以不先将索引移到列，然后再移到列吗返回索引？）
重新排序
做算术运算

代码：

将numpy导入为np
作为pd进口熊猫
df=pd.DataFrame（）
迈伦=整数（10e3）
df['ix']=np.arange（0，mylen）
df['amount']=np.随机.均匀（10e3,20e3，mylen）
df['category']=np。其中（df['ix']使用：

如果要为每个类别创建数据帧
，可以创建dict：
df_category = {i:group for i,group in new_df.groupby('Category')}

df['Category a - cumulative % amount'] = (
    df.groupby(by=df.Month.fillna(np.inf))
    .apply(lambda x: x.Balance.cumsum().div(df.Balance.sum()))
    .reset_index(level=0, drop=True)
)

df.dropna()

    Month   Category    Balance Category a - cumulative % amount
0   1       a           100     0.166667
2   2       a           200     0.333333