Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 当您的数据较大时,是否有有效的方法使用第二个表填写正确的不一致数据?_Python_Pandas_Data Science_Data Analysis_Fuzzywuzzy - Fatal编程技术网

Python 当您的数据较大时,是否有有效的方法使用第二个表填写正确的不一致数据?

Python 当您的数据较大时,是否有有效的方法使用第二个表填写正确的不一致数据?,python,pandas,data-science,data-analysis,fuzzywuzzy,Python,Pandas,Data Science,Data Analysis,Fuzzywuzzy,我有一个数据不一致的表,如下所示: 表1: 航班号 发动机号 飞机尾翼 年 月 000000_20180121 000000 G-RHBZ 2018 01 258741_20171021 258741 H-RZBE 2017 10 _20150214 V-order 2015 02 _20110287 编号 G-EHRK 2011 12 我想你可以用合并 航班号 发动机号坏了 飞机尾翼 年 月 发动机号良好 000000_20180121 000000 G-RHBZ 2018 01 589745

我有一个数据不一致的表,如下所示:

表1:

航班号 发动机号 飞机尾翼 年 月 000000_20180121 000000 G-RHBZ 2018 01 258741_20171021 258741 H-RZBE 2017 10 _20150214 V-order 2015 02 _20110287 编号 G-EHRK 2011 12
我想你可以用
合并

航班号 发动机号坏了 飞机尾翼 年 月 发动机号良好 000000_20180121 000000 G-RHBZ 2018 01 589745 _20150214 V-order 2015 02 348741 _20110287 编号 G-EHRK 2011 12 587981
是的,它可以工作,我将测试整个数据并检查执行时间性能。非常感谢。
import pandas as pd

df1 = pd.DataFrame(data=[
    {"flight_id":"000000_20180121","engine_number":"000000",
     "aircraft_tail":"G-RHBZ","year":"2018","month":"01"},
    {"flight_id":"258741_20171021","engine_number":"258741",
     "aircraft_tail":"H-RZBE","year":"2017","month":"10"},
    {"flight_id":"_20150214","engine_number":"",
     "aircraft_tail":"V-RDER","year":"2015","month":"02"},
    {"flight_id":"_20110287","engine_number":"NO-NUMBER", 
     "aircraft_tail":"G-EHRK","year":"2011","month":"12"}]
)
df2 = pd.DataFrame(data=[
    {"engine_number":"258741","aircraft_tail":"H-RZBE","year":"2017","month":"10"},
    {"engine_number":"348741","aircraft_tail":"V-RDER","year":"2015","month":"02"},
    {"engine_number":"348741","aircraft_tail":"V-RDER","year":"2015","month":"03"},
    {"engine_number":"589745","aircraft_tail":"G-RHBZ","year":"2018","month":"01"},
    {"engine_number":"587981","aircraft_tail":"G-EHRK","year":"2011","month":"12"}]
    )

# Validator function
def bad_engine_number_detector(engine_number):

    lst_invalid_engine_number = ["000000", "NO-NUMBER"]

    is_bad_engine_number = False
    if engine_number == "":
        is_bad_engine_number = True
    elif engine_number in lst_invalid_engine_number:
        is_bad_engine_number = True

    return is_bad_engine_number
    
# Identify invalid entries on df1
mask = df1["engine_number"].apply(bad_engine_number_detector)

# Merge both tables (df1 filtered only with bad entries)
df1.loc[mask].merge(df2, 
                    on=["aircraft_tail","year","month"],
                    suffixes=["_bad","_good"])