Pandas str.contains和str.find的结果不同

Pandas str.contains和str.find的结果不同,pandas,Pandas,在我看来,两者应该给出相同的答案: train = pd.read_csv('https://raw.github.com/mattdelhey/kaggle-titanic/master/Data/train.csv') train.name.str.contains('Mr.').sum() (train.name.str.find('Mr.')>0).sum() 但产出是: 647 517 不同结果背后的原因是什么?差异在于str.contains也匹配Mrs.,因为是特殊的正则

在我看来,两者应该给出相同的答案:

train = pd.read_csv('https://raw.github.com/mattdelhey/kaggle-titanic/master/Data/train.csv')
train.name.str.contains('Mr.').sum()
(train.name.str.find('Mr.')>0).sum()
但产出是:

647
517

不同结果背后的原因是什么?

差异在于
str.contains
也匹配
Mrs.
,因为
是特殊的正则字符(用于匹配任何字符)

我认为需要转义它或添加参数
regex=False

print(train.name.str.contains('Mr\.').sum())
517
print(train.name.str.contains('Mr.', regex=False).sum())
517
print((train.name.str.find('Mr.')>0).sum())
517
测试差异:

a = train.loc[train.name.str.contains('Mr.'), 'name']
b = train.loc[(train.name.str.find('Mr.')>0), 'name']


c = pd.concat([a, b], axis=1, keys=('contains','find'))
c = c[c.isnull().any(axis=1)]
print (c)
                                              contains find
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  NaN
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  NaN
8    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  NaN
9                  Nasser, Mrs. Nicholas (Adele Achem)  NaN
15                    Hewlett, Mrs. (Mary D Kingcome)   NaN
18   Vander Planke, Mrs. Julius (Emelia Maria Vande...  NaN
19                             Masselmani, Mrs. Fatima  NaN
25   Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...  NaN
31      Spencer, Mrs. William Augustus (Marie Eugenie)  NaN
40      Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  NaN
41   Turpin, Mrs. William John Robert (Dorothy Ann ...  NaN
49       Arnold-Franchi, Mrs. Josef (Josefine Franchi)  NaN
52            Harper, Mrs. Henry Sleeper (Myna Haxtun)  NaN
53   Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkin...  NaN
66                        Nye, Mrs. (Elizabeth Ramell)  NaN
85   Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...  NaN
...
...

非常感谢。我想问你在编辑中明确提出的问题。