Python 每月对每个国家/地区当月的计数求和
我有一个2000行数据集的列表,列表中有一个国家,然后是它们的计数。我想通过分解列表并将每个月的所有计数进行汇总Python 每月对每个国家/地区当月的计数求和,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个2000行数据集的列表,列表中有一个国家,然后是它们的计数。我想通过分解列表并将每个月的所有计数进行汇总 df_grouped=df.pivot_table(index=('month','month_int', 'year'),values='views',aggfunc='max') count period_start year month_int month Countries 1 06/08/2018 2018 6
df_grouped=df.pivot_table(index=('month','month_int', 'year'),values='views',aggfunc='max')
count period_start year month_int month Countries
1 06/08/2018 2018 6 August []
1 06/08/2018 2018 6 August ['Spain', 'Brazil', 'Porgutal', 'France', 'Romania', 'Germany#', 'Norway']
1 06/08/2018 2018 6 August ['Spain', 'Brazil', 'Porgutal', 'France', 'Romania', 'Germany#', 'Norway']
1 06/08/2018 2018 6 August ['Porgutal', 'Canada', 'USA', 'Croatia', 'Egypt', 'Netherlands', 'Swizerland', 'Japan']
2 06/08/2018 2018 6 August ['China', 'India', 'Vietnam']
1 06/08/2018 2018 6 August ['Indai', ' Pakistan', 'Mongolia']
1 06/08/2018 2018 6 August ['Indai', ' Pakistan', 'Mongolia']
1 06/08/2018 2018 6 August ['Indai', ' Pakistan', 'Mongolia']
1 06/08/2018 2018 6 August []
1 06/08/2018 2018 6 August ['Germany', 'Spain', 'China', 'USA']
6 06/08/2018 2018 6 August ['Germany', 'Spain', 'China', 'USA']
1 06/08/2018 2018 6 Sept ['Germany', 'Spain', 'China', 'USA']
5 06/08/2018 2018 6 Sept ['Germany', 'Spain', 'China', 'USA']
4 06/08/2018 2018 6 Sept ['Germany', 'Spain', 'China', 'USA']
....
我不知道如何分解国家主题,计算每行的计数并按国家分组。使用.explode()
和.groupby()
。您需要reset_index()
使其成为一个数据帧,并传递name='Countries'
或与国家不同的任何名称;否则,将出现错误,因为列名已存在:
df = (df.explode('Countries')
.groupby(['year','month','Countries'])['Countries'].count().reset_index(name='Countries Count'))
df
Out[1]:
year month Countries Countries Count
0 2018 August Pakistan 3
1 2018 August Brazil 2
2 2018 August Canada 1
3 2018 August China 3
4 2018 August Croatia 1
5 2018 August Egypt 1
6 2018 August France 2
7 2018 August Germany 2
8 2018 August Germany# 2
9 2018 August Indai 3
10 2018 August India 1
11 2018 August Japan 1
12 2018 August Mongolia 3
13 2018 August Netherlands 1
14 2018 August Norway 2
15 2018 August Porgutal 3
16 2018 August Romania 2
17 2018 August Spain 4
18 2018 August Swizerland 1
19 2018 August USA 3
20 2018 August Vietnam 1
21 2018 Sept China 3
22 2018 Sept Germany 3
23 2018 Sept Spain 3
24 2018 Sept USA 3
使用.explode()
和.groupby()
。您需要reset_index()
使其成为一个数据帧,并传递name='Countries'
或与国家不同的任何名称;否则,将出现错误,因为列名已存在:
df = (df.explode('Countries')
.groupby(['year','month','Countries'])['Countries'].count().reset_index(name='Countries Count'))
df
Out[1]:
year month Countries Countries Count
0 2018 August Pakistan 3
1 2018 August Brazil 2
2 2018 August Canada 1
3 2018 August China 3
4 2018 August Croatia 1
5 2018 August Egypt 1
6 2018 August France 2
7 2018 August Germany 2
8 2018 August Germany# 2
9 2018 August Indai 3
10 2018 August India 1
11 2018 August Japan 1
12 2018 August Mongolia 3
13 2018 August Netherlands 1
14 2018 August Norway 2
15 2018 August Porgutal 3
16 2018 August Romania 2
17 2018 August Spain 4
18 2018 August Swizerland 1
19 2018 August USA 3
20 2018 August Vietnam 1
21 2018 Sept China 3
22 2018 Sept Germany 3
23 2018 Sept Spain 3
24 2018 Sept USA 3
你能告诉我们你试过什么吗?你能告诉我们你试过什么吗?我想得到的是“国家计数”是一个国家当月计数的总和。德国9月份将有10个。@Arron这给了你正确的答案吗df=(df.explode('Countries').groupby('year','month','Countries'))['count'].count().reset_index(name='Countries count'))df
即使在那里也会显示“column not found-count”。它可能在索引上。在上面的代码do df=df.reset_index()之前,我尝试将['count'].count()替换为.agg(['count'])。这也行不通。我想得到的是“国家计数”是一个国家当月计数的总和。德国9月份将有10个。@Arron这给了你正确的答案吗df=(df.explode('Countries').groupby('year','month','Countries'))['count'].count().reset_index(name='Countries count'))df
即使在那里也会显示“column not found-count”。它可能在索引上。在上面的代码do df=df.reset_index()之前,我尝试将['count'].count()替换为.agg(['count'])。那也不行。