Python 使用pandas统计列中的常用值
我有一个类似于csv的Python 使用pandas统计列中的常用值,python,pandas,Python,Pandas,我有一个类似于csv的csv PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C8
csv
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
我需要数一数最受欢迎的男女名字。
我可以像你一样做
for names in data['Name']:
name = names.split(', ')
print name[0]
但是,有没有一种方法可以只使用
pandas
?我认为您可以先将名称解析为新的系列
,然后通过列Sex
withser
with and:
这与使用帮助器列所有\u名称
相同:
data['all_names'] = data['Name'].str.split(',').str[0]
print data
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Futrelle, Mrs. John Bradley (Florence Briggs T... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0
7 Braund, Master. Gosta Leonard male 2.0 3
Parch Ticket Fare Cabin Embarked all_names
0 0 A/5 21171 7.2500 NaN S Braund
1 0 PC 17599 71.2833 C85 C Futrelle
2 0 STON/O2. 3101282 7.9250 NaN S Heikkinen
3 0 113803 53.1000 C123 S Futrelle
4 0 373450 8.0500 NaN S Allen
5 0 330877 8.4583 NaN Q Moran
6 0 17463 51.8625 E46 S McCarthy
7 1 349909 21.0750 NaN S Braund
在名称拆分方面,我认为最好的方法是df.apply
,但这并不比循环好多少。你能把它读入df并按df=pd.read\u csv('comma.csv')分组吗。。groupby(['Name','Sex',sort=False).max()
ser = data['Name'].str.split(',').str[0]
print ser
0 Braund
1 Futrelle
2 Heikkinen
3 Futrelle
4 Allen
5 Moran
6 McCarthy
7 Braund
Name: Name, dtype: object
print ser.groupby([data['Sex'], ser]).count()
Sex Name
female Futrelle 2
Heikkinen 1
male Allen 1
Braund 2
McCarthy 1
Moran 1
dtype: int64
print ser.groupby([data['Sex'], ser]).count().nlargest(4)
Sex Name
female Futrelle 2
male Braund 2
female Heikkinen 1
male Allen 1
dtype: int64
data['all_names'] = data['Name'].str.split(',').str[0]
print data
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Futrelle, Mrs. John Bradley (Florence Briggs T... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0
7 Braund, Master. Gosta Leonard male 2.0 3
Parch Ticket Fare Cabin Embarked all_names
0 0 A/5 21171 7.2500 NaN S Braund
1 0 PC 17599 71.2833 C85 C Futrelle
2 0 STON/O2. 3101282 7.9250 NaN S Heikkinen
3 0 113803 53.1000 C123 S Futrelle
4 0 373450 8.0500 NaN S Allen
5 0 330877 8.4583 NaN Q Moran
6 0 17463 51.8625 E46 S McCarthy
7 1 349909 21.0750 NaN S Braund
print data.groupby(['Sex', 'all_names'])['all_names'].count()
Sex all_names
female Futrelle 2
Heikkinen 1
male Allen 1
Braund 2
McCarthy 1
Moran 1
Name: all_names, dtype: int64
print data.groupby(['Sex', 'all_names'])['all_names'].count().nlargest(4)
Sex all_names
female Futrelle 2
male Braund 2
female Heikkinen 1
male Allen 1
Name: all_names, dtype: int64