Python 在数据帧上应用函数赢得'；t工作，函数中的AttributeError错误_Python_Pandas_Dataframe_Apply

Python 在数据帧上应用函数赢得'；t工作，函数中的AttributeError错误

python pandas dataframe

Python 在数据帧上应用函数赢得'；t工作，函数中的AttributeError错误,python,pandas,dataframe,apply,Python,Pandas,Dataframe,Apply,我正在尝试应用一个函数，该函数将返回“已清理”的电子邮件值。然而，我很难将我的函数应用到我尊敬的专栏上请推荐最好的方法样本数据： sample_data= {'email': ['Sam@mail.com','Sam@mail.com', 'Doug@mail.com', 'Doug@mail.com', np.NAN, np.NAN], 'price': [25.95,

我正在尝试应用一个函数，该函数将返回“已清理”的电子邮件值。然而，我很难将我的函数应用到我尊敬的专栏上

请推荐最好的方法

样本数据：

sample_data= {'email': ['Sam@mail.com','Sam@mail.com',
                        'Doug@mail.com', 'Doug@mail.com',
                       np.NAN, np.NAN],
              'price': [25.95, 31.25, 34.95, 19.95, 59.95, 15.75]}

sample_df = pd.DataFrame(sample_data)

# print(sample_df)
    email   price
0   Sam@mail.com    25.95
1   Sam@mail.com    31.25
2   Doug@mail.com   34.95
3   Doug@mail.com   19.95
4   NaN     59.95
5   NaN     15.75

def clean_emails(s):
    emails = {x: str(x).lower() for x in s.unique()}
    return s.map(emails)

# Passing the column directly into the function works
sample_df.email = clean_emails(sample_df.email)

# So does passing the entire df into an apply statement
sample_df = sample_df.apply(clean_emails)

print(sample_df)

    email   price
0   sam@mail.com    25.95
1   sam@mail.com    31.25
2   doug@mail.com   34.95
3   doug@mail.com   19.95
4   nan     59.95
5   nan     15.75

应用功能：

sample_data= {'email': ['Sam@mail.com','Sam@mail.com',
                        'Doug@mail.com', 'Doug@mail.com',
                       np.NAN, np.NAN],
              'price': [25.95, 31.25, 34.95, 19.95, 59.95, 15.75]}

sample_df = pd.DataFrame(sample_data)

# print(sample_df)
    email   price
0   Sam@mail.com    25.95
1   Sam@mail.com    31.25
2   Doug@mail.com   34.95
3   Doug@mail.com   19.95
4   NaN     59.95
5   NaN     15.75

def clean_emails(s):
    emails = {x: str(x).lower() for x in s.unique()}
    return s.map(emails)

# Passing the column directly into the function works
sample_df.email = clean_emails(sample_df.email)

# So does passing the entire df into an apply statement
sample_df = sample_df.apply(clean_emails)

print(sample_df)

    email   price
0   sam@mail.com    25.95
1   sam@mail.com    31.25
2   doug@mail.com   34.95
3   doug@mail.com   19.95
4   nan     59.95
5   nan     15.75

如图所示，将列直接传递到函数中是可行的。应用整个df也是如此。我关心的是更大的数据集，将单个列传递给函数

总而言之，将df的单个列传递给函数是解决此问题的最佳方法吗？或者可以使用

apply

吗？

您使用的函数是

unique（）

，它不是数据帧的属性。似乎您打算将其应用于系列，而不是数据帧

有几件事要记住

您的函数将

str

应用于

NaN

值，并将它们转换为字符串，然后这些字符串将不会被

pd.isnull

识别。我想你不想那样

我忘了：）

将numpy导入为np
作为pd进口熊猫
sample_data=pd.DataFrame（{'email'：['Sam@mail.com','Sam@mail.com', 'Doug@mail.com', 'Doug@mail.com“，np.NAN，np.NAN]，
‘价格’：[25.95,31.25,34.95,19.95,59.95,15.75]}）
sample\u data.email=sample\u data.email.str.lower（）

你也可以这样做

email_dict = {el: el.lower() for el in sample_data.email.unique() if pd.notnull(el)}
sample_data.email = sample_data.email.replace(email_dict)

对于完整的数据帧，您应该在更大的

df

中使用

applymap

，这会变得非常慢。@YaakovBressler为什么？是因为它创建了一个新的系列对象吗？每一行都转换成一个字符串。通过dict映射，重复的速度加快了。@YaakovBressler对。我假设数据不是很大，我希望显示Pandas有内置属性来访问（非null）元素的字符串表示。我认为

replace

属性更快，我认为这就是OP想要使用的。我编辑了我的答案