Python 熊猫中的数据清理:如果这些字符串包含在另一列中,则用特定字符串替换空值

Python 熊猫中的数据清理:如果这些字符串包含在另一列中,则用特定字符串替换空值,python,pandas,data-cleaning,Python,Pandas,Data Cleaning,我目前正在开发一个汽车排放数据集,我想在其中清理/标准化车型名称。数据集相当大,但以下是前10行: cars_em_df = pd.DataFrame({'manufacturer_name_mapped': ['FIAT', 'FIAT','FIAT','FIAT','FIAT','BMW AG','BMW AG','BMW AG','BMW AG','BMW AG'], 'commercial_name':['124 gt multiair auto', '500l wagon pop st

我目前正在开发一个汽车排放数据集,我想在其中清理/标准化车型名称。数据集相当大,但以下是前10行:

cars_em_df = pd.DataFrame({'manufacturer_name_mapped': ['FIAT', 'FIAT','FIAT','FIAT','FIAT','BMW AG','BMW AG','BMW AG','BMW AG','BMW AG'],
'commercial_name':['124 gt multiair auto', '500l wagon pop star t-jet', 
'doblo combi 1.4 95', 'panda  0.9t sge 85 natural power', 'punto 1.4  77 lpg', 'x4 xdrive20d se auto', '216d active tourer b37 f45','220d gran tourer b47 f46','x1 xdrive18d sport','320i xdrive m sport gt auto'],
'fuel_type_mapped':['Petrol', 'Petrol', 'Petrol', 'NG-Biomethane', 'LPG','Diesel','Diesel','Diesel','Diesel','Petrol'],
'file_year':[2018, 2018, 2018, 2018, 2018,2018, 2018, 2018, 2018, 2018], 'emissions': [153,158,165,86,114,131,166,200,151,149], 'commercial_name_cleaned':['124','500',None,'panda','punto','x4',None,None,'x1',None]})  
右边的'commercial\u name\u cleaned'一栏是我第一次清理的结果,我将'commercial\u name'一栏中的名称与来自不同来源的标准化名称列表进行了匹配。正如你所看到的,这些都是非常简单和简短的名字。每当我无法匹配模型名时,我的函数返回“None”

作为第二步,我现在要执行以下操作:如果是“无”,则在相邻的'commercial_name'列中搜索特定字符串,并将其替换为我指定的型号名称。我试过这个:

    def str_ops(commercial_name_cleaned,commercial_name):
          if commercial_name_cleaned == None:
             if '216' in commercial_name:
                return '2-series'
             elif '220' in commercial_name:
                return '2-series'
             elif '320' in commercial_name:
                return '3-series'
然后我会将此函数应用于数据帧:

cars_em_df['commercial_name_cleaned'] = cars_em_df.apply(lambda x: str_ops(str(x.commercial_name_cleaned), str(x.commercial_name)), axis=1)
需要注意的是,如果在'commercial\u name'中找不到'320'或'220'等,则函数不应更改任何内容,只返回'commercial\u name\u>中已有的值。但是,当我应用该函数时,整个'commercial\u name\u cleaned'列就变成了“None”值。所以这个函数肯定有问题。有人知道如何解决这个问题吗


非常感谢您的帮助,谢谢

您在
commercial\u name\u
列中得到
None
值,因为您没有从函数
str\u ops
返回任何内容,当您没有显式隐式返回任何内容时,将返回
None
类型。

替换:

def str_ops(commercial_name_cleaned,commercial_name):
    if commercial_name_cleaned == None:
        if '216' in commercial_name:
            return '2-series'
        elif '220' in commercial_name:
            return '2-series'
        elif '320' in commercial_name:
            return '3-series'
与:

def str_ops(commercial_name_cleaned,commercial_name):
    if commercial_name_cleaned == 'None':
        if '216' in commercial_name:
            return '2-series'
        elif '220' in commercial_name:
            return '2-series'
        elif '320' in commercial_name:
            return '3-series'
    else:
        return commercial_name_cleaned
manufacturer_name_mapped                   commercial_name  ... emissions  commercial_name_cleaned
0                     FIAT              124 gt multiair auto  ...       153                      124
1                     FIAT         500l wagon pop star t-jet  ...       158                      500
2                     FIAT                doblo combi 1.4 95  ...       165                     None
3                     FIAT  panda  0.9t sge 85 natural power  ...        86                    panda
4                     FIAT                 punto 1.4  77 lpg  ...       114                    punto
5                   BMW AG              x4 xdrive20d se auto  ...       131                       x4
6                   BMW AG        216d active tourer b37 f45  ...       166                 2-series
7                   BMW AG          220d gran tourer b47 f46  ...       200                 2-series
8                   BMW AG                x1 xdrive18d sport  ...       151                       x1
9                   BMW AG       320i xdrive m sport gt auto  ...       149                 3-series
输出:

def str_ops(commercial_name_cleaned,commercial_name):
    if commercial_name_cleaned == 'None':
        if '216' in commercial_name:
            return '2-series'
        elif '220' in commercial_name:
            return '2-series'
        elif '320' in commercial_name:
            return '3-series'
    else:
        return commercial_name_cleaned
manufacturer_name_mapped                   commercial_name  ... emissions  commercial_name_cleaned
0                     FIAT              124 gt multiair auto  ...       153                      124
1                     FIAT         500l wagon pop star t-jet  ...       158                      500
2                     FIAT                doblo combi 1.4 95  ...       165                     None
3                     FIAT  panda  0.9t sge 85 natural power  ...        86                    panda
4                     FIAT                 punto 1.4  77 lpg  ...       114                    punto
5                   BMW AG              x4 xdrive20d se auto  ...       131                       x4
6                   BMW AG        216d active tourer b37 f45  ...       166                 2-series
7                   BMW AG          220d gran tourer b47 f46  ...       200                 2-series
8                   BMW AG                x1 xdrive18d sport  ...       151                       x1
9                   BMW AG       320i xdrive m sport gt auto  ...       149                 3-series

您在
commercial\u name\u cleaned
列中获得
None
值,因为您没有从函数
str\u ops
返回任何内容,当您没有显式地隐式返回任何内容时,将返回无类型。

替换:

def str_ops(commercial_name_cleaned,commercial_name):
    if commercial_name_cleaned == None:
        if '216' in commercial_name:
            return '2-series'
        elif '220' in commercial_name:
            return '2-series'
        elif '320' in commercial_name:
            return '3-series'
与:

def str_ops(commercial_name_cleaned,commercial_name):
    if commercial_name_cleaned == 'None':
        if '216' in commercial_name:
            return '2-series'
        elif '220' in commercial_name:
            return '2-series'
        elif '320' in commercial_name:
            return '3-series'
    else:
        return commercial_name_cleaned
manufacturer_name_mapped                   commercial_name  ... emissions  commercial_name_cleaned
0                     FIAT              124 gt multiair auto  ...       153                      124
1                     FIAT         500l wagon pop star t-jet  ...       158                      500
2                     FIAT                doblo combi 1.4 95  ...       165                     None
3                     FIAT  panda  0.9t sge 85 natural power  ...        86                    panda
4                     FIAT                 punto 1.4  77 lpg  ...       114                    punto
5                   BMW AG              x4 xdrive20d se auto  ...       131                       x4
6                   BMW AG        216d active tourer b37 f45  ...       166                 2-series
7                   BMW AG          220d gran tourer b47 f46  ...       200                 2-series
8                   BMW AG                x1 xdrive18d sport  ...       151                       x1
9                   BMW AG       320i xdrive m sport gt auto  ...       149                 3-series
输出:

def str_ops(commercial_name_cleaned,commercial_name):
    if commercial_name_cleaned == 'None':
        if '216' in commercial_name:
            return '2-series'
        elif '220' in commercial_name:
            return '2-series'
        elif '320' in commercial_name:
            return '3-series'
    else:
        return commercial_name_cleaned
manufacturer_name_mapped                   commercial_name  ... emissions  commercial_name_cleaned
0                     FIAT              124 gt multiair auto  ...       153                      124
1                     FIAT         500l wagon pop star t-jet  ...       158                      500
2                     FIAT                doblo combi 1.4 95  ...       165                     None
3                     FIAT  panda  0.9t sge 85 natural power  ...        86                    panda
4                     FIAT                 punto 1.4  77 lpg  ...       114                    punto
5                   BMW AG              x4 xdrive20d se auto  ...       131                       x4
6                   BMW AG        216d active tourer b37 f45  ...       166                 2-series
7                   BMW AG          220d gran tourer b47 f46  ...       200                 2-series
8                   BMW AG                x1 xdrive18d sport  ...       151                       x1
9                   BMW AG       320i xdrive m sport gt auto  ...       149                 3-series

您希望使用其中的多少个条件,它们的逻辑有多复杂。。。例如-总是简单的子串测试还是更复杂的测试?您希望使用多少这样的条件,它们的逻辑有多复杂。。。例如,它总是简单的子字符串测试还是更复杂的测试?