Python df.col.replace()与df.col.str.replace()的比较

Python df.col.replace()与df.col.str.replace()的比较,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个要求,我必须用“”替换“”(删除双引号)。因此我尝试了以下方法: 方法1 test.Name = test.Name.replace('"', '') test_label.name = test_label.name.replace('"', '') 两个数据帧都有相同的值,因此如果我尝试查看两个列的值之间的差异,我应该得到null。但令我惊讶的是,它不是null。我尝试了以下方法: 我仍然可以在值中看到“替换”,这意味着替换不起作用。所以我尝试了另一种方法。 方法2 test.Na

我有一个要求,我必须用“”替换“”(删除双引号)。因此我尝试了以下方法:
方法1

test.Name = test.Name.replace('"', '')
test_label.name = test_label.name.replace('"', '')
两个数据帧都有相同的值,因此如果我尝试查看两个列的值之间的差异,我应该得到null。但令我惊讶的是,它不是null。我尝试了以下方法:

我仍然可以在值中看到“替换”,这意味着替换不起作用。所以我尝试了另一种方法。
方法2

test.Name = test.Name.str.replace('"', '', regex=False)
test_label.name = test_label.name.str.replace('"', '', regex=False)

set(test.Name) - set(test_label.name)
set()
第二种方法得到了我所期望的结果。所以我的问题是为什么df.col.不替换()这些值


通过检查,我们可以确定
df.Name
df.Name.str
的类型:

print(type(df.Name)) # <class 'pandas.core.series.Series'>  
print(type(df.Name.str)) # <class 'pandas.core.strings.StringMethods'>  
请注意,Series.replace的
regex
参数默认值为
False
,而Series.str.replace的默认值为
True
。因此,如果希望这两个函数都具有预期的结果,即删除双引号,则必须为Series.replace方法将regex参数设置为
True

下面是一个比较系列的结果的示例。将替换为
regex=False
,将
regex=True
替换为系列的结果。str.replace

import pandas as pd

data = { 

        'Name': 
                [
                    'Assaf Khalil, Mrs. Mariana (Miriam")"',
                    'Cotterill, Mr. Henry Harry""',
                    'Coutts, Mrs. William (Winnie Minnie" Treanor)"',
                    'Daly, Miss. Margaret Marcella Maggie""',
                    'Dean, Miss. Elizabeth Gladys Millvina""',
                    'Hocking, Miss. Ellen Nellie""',
                    'Johnston, Master. William Arthur Willie""',
                    'Johnston, Mrs. Andrew G (Elizabeth Lily" Watson)"',
                    'Katavelas, Mr. Vassilios (Catavelas Vassilios")"',
                    'Khalil, Mrs. Betros (Zahie Maria" Elias)"',
                    'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey")"',
                    'McCarthy, Miss. Catherine Katie""',
                    'Moubarek, Mrs. George (Omine Amenia" Alexander)"',
                    'Nakid, Mrs. Said (Waika Mary" Mowad)"',
                    'Nourney, Mr. Alfred (Baron von Drachstedt")"',
                    'Riihivouri, Miss. Susanna Juhantytar Sanni""',
                    'Riordan, Miss. Johanna Hannah""',
                    'Rosenshine, Mr. George (Mr George Thorne")"',
                    'Thomas, Mrs. Alexander (Thamine Thelma")"',
                    'Wells, Mrs. Arthur Henry (Addie" Dart Trevaskis)"',
                    'Wheeler, Mr. Edwin Frederick""',
                    'Willer, Mr. Aaron (Abi Weller")"'
        ]
}

df1 = pd.DataFrame.from_dict(data)
df2 = pd.DataFrame.from_dict(data)
df3 = pd.DataFrame.from_dict(data)

df1.Name = df1.Name.replace('"', '', regex = True)
df2.Name = df2.Name.replace('"', '', regex = False)
df3.Name = df3.Name.str.replace('"', '')

print("df1 equals df2?:", df1.equals(df2))
print("df1 equals df3?:", df1.equals(df3))
print(set(df1.Name) - set(df2.Name))
print(set(df1.Name) - set(df3.Name))
输出:

df1 equals df2?: False
df1 equals df3?: True
{'Moubarek, Mrs. George (Omine Amenia Alexander)', 'McCarthy, Miss. Catherine Katie', 'Cotterill, Mr. Henry Harry', 'Katavelas, Mr. Vassilios (Catavelas Vassilios)', 'Coutts, Mrs. William (Winnie Minnie Treanor)', 'Hocking, Miss. Ellen Nellie', 'Wheeler, Mr. Edwin Frederick', 'Thomas, Mrs. Alexander (Thamine Thelma)', 'Johnston, Mrs. Andrew G (Elizabeth Lily Watson)', 'Dean, Miss. Elizabeth Gladys Millvina', 'Willer, Mr. Aaron (Abi Weller)', 'Nourney, Mr. Alfred (Baron von Drachstedt)', 'Wells, Mrs. Arthur Henry (Addie Dart Trevaskis)', 'Assaf Khalil, Mrs. Mariana (Miriam)', 'Daly, Miss. Margaret Marcella Maggie', 'Johnston, Master. William Arthur Willie', 'Riihivouri, Miss. Susanna Juhantytar Sanni', 'Rosenshine, Mr. George (Mr George Thorne)', 'Nakid, Mrs. Said (Waika Mary Mowad)', 'Riordan, Miss. Johanna Hannah', 'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey)', 'Khalil, Mrs. Betros (Zahie Maria Elias)'}
set()

通过检查,我们可以确定
df.Name
df.Name.str
的类型:

print(type(df.Name)) # <class 'pandas.core.series.Series'>  
print(type(df.Name.str)) # <class 'pandas.core.strings.StringMethods'>  
请注意,Series.replace的
regex
参数默认值为
False
,而Series.str.replace的默认值为
True
。因此,如果希望这两个函数都具有预期的结果,即删除双引号,则必须为Series.replace方法将regex参数设置为
True

下面是一个比较系列的结果的示例。将替换为
regex=False
,将
regex=True
替换为系列的结果。str.replace

import pandas as pd

data = { 

        'Name': 
                [
                    'Assaf Khalil, Mrs. Mariana (Miriam")"',
                    'Cotterill, Mr. Henry Harry""',
                    'Coutts, Mrs. William (Winnie Minnie" Treanor)"',
                    'Daly, Miss. Margaret Marcella Maggie""',
                    'Dean, Miss. Elizabeth Gladys Millvina""',
                    'Hocking, Miss. Ellen Nellie""',
                    'Johnston, Master. William Arthur Willie""',
                    'Johnston, Mrs. Andrew G (Elizabeth Lily" Watson)"',
                    'Katavelas, Mr. Vassilios (Catavelas Vassilios")"',
                    'Khalil, Mrs. Betros (Zahie Maria" Elias)"',
                    'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey")"',
                    'McCarthy, Miss. Catherine Katie""',
                    'Moubarek, Mrs. George (Omine Amenia" Alexander)"',
                    'Nakid, Mrs. Said (Waika Mary" Mowad)"',
                    'Nourney, Mr. Alfred (Baron von Drachstedt")"',
                    'Riihivouri, Miss. Susanna Juhantytar Sanni""',
                    'Riordan, Miss. Johanna Hannah""',
                    'Rosenshine, Mr. George (Mr George Thorne")"',
                    'Thomas, Mrs. Alexander (Thamine Thelma")"',
                    'Wells, Mrs. Arthur Henry (Addie" Dart Trevaskis)"',
                    'Wheeler, Mr. Edwin Frederick""',
                    'Willer, Mr. Aaron (Abi Weller")"'
        ]
}

df1 = pd.DataFrame.from_dict(data)
df2 = pd.DataFrame.from_dict(data)
df3 = pd.DataFrame.from_dict(data)

df1.Name = df1.Name.replace('"', '', regex = True)
df2.Name = df2.Name.replace('"', '', regex = False)
df3.Name = df3.Name.str.replace('"', '')

print("df1 equals df2?:", df1.equals(df2))
print("df1 equals df3?:", df1.equals(df3))
print(set(df1.Name) - set(df2.Name))
print(set(df1.Name) - set(df3.Name))
输出:

df1 equals df2?: False
df1 equals df3?: True
{'Moubarek, Mrs. George (Omine Amenia Alexander)', 'McCarthy, Miss. Catherine Katie', 'Cotterill, Mr. Henry Harry', 'Katavelas, Mr. Vassilios (Catavelas Vassilios)', 'Coutts, Mrs. William (Winnie Minnie Treanor)', 'Hocking, Miss. Ellen Nellie', 'Wheeler, Mr. Edwin Frederick', 'Thomas, Mrs. Alexander (Thamine Thelma)', 'Johnston, Mrs. Andrew G (Elizabeth Lily Watson)', 'Dean, Miss. Elizabeth Gladys Millvina', 'Willer, Mr. Aaron (Abi Weller)', 'Nourney, Mr. Alfred (Baron von Drachstedt)', 'Wells, Mrs. Arthur Henry (Addie Dart Trevaskis)', 'Assaf Khalil, Mrs. Mariana (Miriam)', 'Daly, Miss. Margaret Marcella Maggie', 'Johnston, Master. William Arthur Willie', 'Riihivouri, Miss. Susanna Juhantytar Sanni', 'Rosenshine, Mr. George (Mr George Thorne)', 'Nakid, Mrs. Said (Waika Mary Mowad)', 'Riordan, Miss. Johanna Hannah', 'Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey)', 'Khalil, Mrs. Betros (Zahie Maria Elias)'}
set()