Python 使用2列的累积和_Python_Pandas_Cumulative Frequency

Python 使用2列的累积和

python pandas

Python 使用2列的累积和,python,pandas,cumulative-frequency,Python,Pandas,Cumulative Frequency,我正在尝试创建一个使用2列进行累计和的列，请参阅我尝试执行的示例：@Faith Akici index lodgement_year words sum cum_sum 0 2000 the 14 14 1 2000 australia 10 10 2 2000 word 12 12 3 2000

我正在尝试创建一个使用2列进行累计和的列，请参阅我尝试执行的示例：@Faith Akici

  index lodgement_year  words       sum  cum_sum
    0   2000            the          14     14
    1   2000            australia    10     10
    2   2000            word         12     12
    3   2000            brand         8      8
    4   2000            fresh         5      5
    5   2001            the           8      22
    6   2001            australia     3      13
    7   2001            banana        1       1
    8   2001            brand         7      15
    9   2001            fresh         1       6

我使用了下面的代码，但是我的计算机不断崩溃，我不确定是代码还是计算机。任何帮助都将不胜感激：

   df_2['cumsum']= df_2.groupby('lodgement_year')['words'].transform(pd.Series.cumsum)

更新；我也使用了下面的代码，它起作用了，并且说退出代码是0。不过，我还是提出了一些警告

df_2['cum_sum'] =df_2.groupby(['words'])['count'].cumsum()

如果只需要考虑列的“单词”，我们可能需要循环通过单词

的唯一值。

for unique_words in df_2.words.unique():
    if 'cum_sum' not in df_2:
        df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
    else:
        df_2.update(pd.DataFrame({'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()}))

上述结果将导致：

>>> print(df_2)
  lodgement_year  sum      words  cum_sum
0           2000   14        the     14.0
1           2000   10  australia     10.0
2           2000   12       word     12.0
3           2000    8      brand      8.0
4           2000    5      fresh      5.0
5           2001    8        the     22.0
6           2001    3  australia     13.0
7           2001    1     banana      1.0
8           2001    7      brand     15.0
9           2001    1      fresh      6.0

你快到了，伊恩

cumsum（）

方法计算列的累积和。您正在查找应用于分组的

单词。因此：
In [303]: df_2['cumsum'] = df_2.groupby(['words'])['sum'].cumsum()

In [304]: df_2
Out[304]: 
   index  lodgement_year      words  sum  cum_sum  cumsum
0      0            2000        the   14       14      14
1      1            2000  australia   10       10      10
2      2            2000       word   12       12      12
3      3            2000      brand    8        8       8
4      4            2000      fresh    5        5       5
5      5            2001        the    8       22      22
6      6            2001  australia    3       13      13
7      7            2001     banana    1        1       1
8      8            2001      brand    7       15      15
9      9            2001      fresh    1        6       6

如果您的更大数据集出现此问题，请发表意见，我们将制定一个可能更准确的版本。
是'loddement\u year'
还是'loddement\u date'
？其loddement\u year（抱歉）@Piintesky我只是想参考前面的问题作为背景。但很高兴删除。是否需要“提交年份”？根据示例输出，累积和似乎只是基于“字”的？您能显示预期的输出数据帧吗？嗨，对不起。。。上面的总和是我试图创建的专栏。下面的代码起作用了，但是它在顶部打印了一些警告。df_2['cum_sum']=df_2.groupby（['words']）['count'].cumsum（）奇怪的是，你的代码没有在我的文件上运行。我的文件包含120年的数据，每年大约650k字，这会是一个问题吗？你能试试我的代码并给我一些反馈吗？谢谢你一如既往的支持。它起作用了，我使用的代码：df_2['cum_sum']=df_2.groupby（['words']）['count'].cumsum（）也起作用了。我会确保在下一个问题中拼写正确：）@FatihAkici，像这样？下一次讨论，