Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中按字符分组?_Python_Pandas - Fatal编程技术网

如何在Python中按字符分组?

如何在Python中按字符分组?,python,pandas,Python,Pandas,我有这样一个大数据框: Name Gender Leo Male Lilly Female Angela Female Donald Male 我想看看每个性别名字中最常见的字符 所以我想按角色分组。类似这样的:(这段代码是错误的,只是我想要的一个例子) df.groupby('NameCharacter')['gender'].value\u counts() 预期输出是这样的(不是这种格式,只是想让您了解预期信息): etcc 我

我有这样一个大数据框:

Name       Gender
Leo         Male
Lilly       Female
Angela      Female
Donald      Male
我想看看每个性别名字中最常见的字符

所以我想按角色分组。类似这样的:(这段代码是错误的,只是我想要的一个例子)

df.groupby('NameCharacter')['gender'].value\u counts()

预期输出是这样的(不是这种格式,只是想让您了解预期信息):

etcc

我使用forloop实现了这一点,但它需要大量的时间和复杂性。

给你:

df = pd.read_clipboard()

x = df.groupby('Gender')
for key, item in x:
    d = x.get_group(key)['Name'].tolist()
    d = ''.join(d)

    chars = np.unique(list(d))
    for c in chars:
        print(c,' appeared ',d.count(c),'times in ',key)
输出:不完全符合您要求的格式,但提供了您需要的内容

A  appeared  1 times in  Female
L  appeared  1 times in  Female
a  appeared  1 times in  Female
e  appeared  1 times in  Female
g  appeared  1 times in  Female
i  appeared  1 times in  Female
l  appeared  3 times in  Female
n  appeared  1 times in  Female
y  appeared  1 times in  Female
D  appeared  1 times in  Male
L  appeared  1 times in  Male
a  appeared  1 times in  Male
d  appeared  1 times in  Male
e  appeared  1 times in  Male
l  appeared  1 times in  Male
n  appeared  1 times in  Male
o  appeared  2 times in  Male

下面是一个可能的解决方案,它使用Pandas索引来分隔男性和女性姓名,将它们连接到一个字符串中,然后在名称字符串上获得字符计数

import pandas as pd
from collections import Counter

df = pd.DataFrame({'Name':['Leo', 'Lily', 'Angela'], 'Gender':['Male', 'Female', 'Female']})

male_name_string = ''.join(df.loc[df['Gender'] == 'Male', 'Name'])
female_name_string = ''.join(df.loc[df['Gender'] == 'Female', 'Name'])

male_char_count = Counter(male_name_string)
female_char_count = Counter(female_name_string)

unique_char = set(list(male_char_count.keys()) + list(female_char_count.keys()))
for c in unique_char:
    print(f'{c} found {female_char_count[c]} times in female and {male_char_count[c]} times in male')
输出:

n found 1 times in female and 0 times in male
e found 1 times in female and 1 times in male
g found 1 times in female and 0 times in male
A found 1 times in female and 0 times in male
a found 1 times in female and 0 times in male
L found 1 times in female and 1 times in male
o found 0 times in female and 1 times in male
l found 2 times in female and 0 times in male
y found 1 times in female and 0 times in male
i found 1 times in female and 0 times in male

请分享你的for循环代码。当我第一次看到它时,我和你一样惊讶。
n found 1 times in female and 0 times in male
e found 1 times in female and 1 times in male
g found 1 times in female and 0 times in male
A found 1 times in female and 0 times in male
a found 1 times in female and 0 times in male
L found 1 times in female and 1 times in male
o found 0 times in female and 1 times in male
l found 2 times in female and 0 times in male
y found 1 times in female and 0 times in male
i found 1 times in female and 0 times in male