Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/356.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用groupby计算子字符串项的数量_Python_Pandas_Pandas Groupby - Fatal编程技术网

Python 如何使用groupby计算子字符串项的数量

Python 如何使用groupby计算子字符串项的数量,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我从这样的输入数据开始 email country_code 12345kinglobito94@hotmail.com RU 12345arturdyikan6211@gmail.com RU 12345leonardosebastianld.20@gmail.com PE 12345k23156876vs@hotmail.com RU 12345jhuillcag@ho

我从这样的输入数据开始

email                               country_code
12345kinglobito94@hotmail.com           RU
12345arturdyikan6211@gmail.com          RU
12345leonardosebastianld.20@gmail.com   PE
12345k23156876vs@hotmail.com            RU
12345jhuillcag@hotmail.com              PE
12345ergasovaskazon72@gmail.com         RU
12345myrzadaevajrat@gmail.com           RU
12345filomena@hotmail.com               BR
12345jppicotajose20@hotmail.com         BR
...                                    ...
打印时显示如下:

                                      email country_code
0            12345kinglobito94@hotmail.com           RU
1           12345arturdyikan6211@gmail.com           RU
2    12345leonardosebastianld.20@gmail.com           PE
3             12345k23156876vs@hotmail.com           RU
4               12345jhuillcag@hotmail.com           PE
5          12345ergasovaskazon72@gmail.com           RU
6            12345myrzadaevajrat@gmail.com           RU
7                12345filomena@hotmail.com           BR
8          12345jppicotajose20@hotmail.com           BR
...                                                 ...
分组非常简单:

country_code
AR     21
BR    340
PE    198
RU    402
US     39
Name: email, dtype: int64

但我想计算一下每个国家有多少hotmail和gmail域名

使用regex提取域名,然后使用groupby().size()即

如果你不想增加一列,你也可以这样做

df.groupby(["country_code",df['email'].str.extract('@(.*?)\.',expand=False)]).size()

我们也可以使用
str.replace()
,但我认为@Dark的变体更惯用:

In [17]: (df.assign(domain=df['email'].str.replace(r'.*?@(.*?)\.\w+', r'\1'))
    ...:    .groupby(['country_code', 'domain'])['email']
    ...:    .count()
    ...:    .reset_index(name='count'))
    ...:
Out[17]:
  country_code   domain  count
0           BR  hotmail      2
1           PE    gmail      1
2           PE  hotmail      1
3           RU    gmail      3
4           RU  hotmail      2
In [17]: (df.assign(domain=df['email'].str.replace(r'.*?@(.*?)\.\w+', r'\1'))
    ...:    .groupby(['country_code', 'domain'])['email']
    ...:    .count()
    ...:    .reset_index(name='count'))
    ...:
Out[17]:
  country_code   domain  count
0           BR  hotmail      2
1           PE    gmail      1
2           PE  hotmail      1
3           RU    gmail      3
4           RU  hotmail      2