Python 使用pandas统计列中的常用值

Python 使用pandas统计列中的常用值,python,pandas,Python,Pandas,我有一个类似于csv的csv PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C8

我有一个类似于csv的
csv

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
我需要数一数最受欢迎的男女名字。 我可以像你一样做

for names in data['Name']:
    name = names.split(', ')
    print name[0]

但是,有没有一种方法可以只使用
pandas

我认为您可以先将名称解析为新的
系列
,然后通过列
Sex
with
ser
with and:

这与使用帮助器列
所有\u名称
相同:

data['all_names'] =  data['Name'].str.split(',').str[0]
print data

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Futrelle, Mrs. John Bradley (Florence Briggs T...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   
5                                   Moran, Mr. James    male   NaN      0   
6                            McCarthy, Mr. Timothy J    male  54.0      0   
7                      Braund, Master. Gosta Leonard    male   2.0      3   

   Parch            Ticket     Fare Cabin Embarked  all_names  
0      0         A/5 21171   7.2500   NaN        S     Braund  
1      0          PC 17599  71.2833   C85        C   Futrelle  
2      0  STON/O2. 3101282   7.9250   NaN        S  Heikkinen  
3      0            113803  53.1000  C123        S   Futrelle  
4      0            373450   8.0500   NaN        S      Allen  
5      0            330877   8.4583   NaN        Q      Moran  
6      0             17463  51.8625   E46        S   McCarthy  
7      1            349909  21.0750   NaN        S     Braund  

在名称拆分方面,我认为最好的方法是
df.apply
,但这并不比循环好多少。你能把它读入df并按df=pd.read\u csv('comma.csv')分组吗。。groupby(['Name','Sex',sort=False).max()
ser = data['Name'].str.split(',').str[0]
print ser
0       Braund
1     Futrelle
2    Heikkinen
3     Futrelle
4        Allen
5        Moran
6     McCarthy
7       Braund
Name: Name, dtype: object

print ser.groupby([data['Sex'], ser]).count()
Sex     Name     
female  Futrelle     2
        Heikkinen    1
male    Allen        1
        Braund       2
        McCarthy     1
        Moran        1
dtype: int64

print ser.groupby([data['Sex'], ser]).count().nlargest(4)
Sex     Name     
female  Futrelle     2
male    Braund       2
female  Heikkinen    1
male    Allen        1
dtype: int64
data['all_names'] =  data['Name'].str.split(',').str[0]
print data

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Futrelle, Mrs. John Bradley (Florence Briggs T...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   
5                                   Moran, Mr. James    male   NaN      0   
6                            McCarthy, Mr. Timothy J    male  54.0      0   
7                      Braund, Master. Gosta Leonard    male   2.0      3   

   Parch            Ticket     Fare Cabin Embarked  all_names  
0      0         A/5 21171   7.2500   NaN        S     Braund  
1      0          PC 17599  71.2833   C85        C   Futrelle  
2      0  STON/O2. 3101282   7.9250   NaN        S  Heikkinen  
3      0            113803  53.1000  C123        S   Futrelle  
4      0            373450   8.0500   NaN        S      Allen  
5      0            330877   8.4583   NaN        Q      Moran  
6      0             17463  51.8625   E46        S   McCarthy  
7      1            349909  21.0750   NaN        S     Braund  
print data.groupby(['Sex', 'all_names'])['all_names'].count()
Sex     all_names
female  Futrelle     2
        Heikkinen    1
male    Allen        1
        Braund       2
        McCarthy     1
        Moran        1
Name: all_names, dtype: int64

print data.groupby(['Sex', 'all_names'])['all_names'].count().nlargest(4)
Sex     all_names
female  Futrelle     2
male    Braund       2
female  Heikkinen    1
male    Allen        1
Name: all_names, dtype: int64