Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于部分字符串匹配,从另一个数据帧填充一个数据帧列_Python_String_Pandas_Dataframe - Fatal编程技术网

Python 基于部分字符串匹配,从另一个数据帧填充一个数据帧列

Python 基于部分字符串匹配,从另一个数据帧填充一个数据帧列,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我是python编程新手。我有两个数据帧df1包含标签(180k行),df2包含设备名称(1600行) df1: df2: df2.Equipment位于df1.TagName中字符串之间的某个位置。我需要根据df2设备是否在df1标记名中进行匹配,然后df2(设备描述和设备编号)必须与df1匹配 最终输出应为 Line TagName quipmentdescription EquipmentNo 187877

我是python编程新手。我有两个数据帧df1包含标签(180k行),df2包含设备名称(1600行)

df1:

df2:

df2.Equipment位于df1.TagName中字符串之间的某个位置。我需要根据df2设备是否在df1标记名中进行匹配,然后df2(设备描述和设备编号)必须与df1匹配

最终输出应为

        Line                TagName                quipmentdescription   EquipmentNo 
187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL     Lifting table        1311256
187878  PT_WOA  .ZS01_RB2202_T05.SB.S2385_FLOK      Roller bed           1311259  
187879  PT_WOA  .ZS01_LA120_T05.SB._CBAbsHy         Lifting table        1311256 
187880  PT_WOA  .ZS01_LA120_T05.SB.S3110_CBAPV      Lifting table        1311256
 187881 PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy     Roller bed           1311260
我现在已经试过了

cols= df2['Equipment'].tolist()
Xs=[]
for i in cols:
    Test = df1.loc[df1.TagName.str.contains(i)] 
    Test['Equip']=i
    Xs.append(Test)
然后根据“设备”合并xs和df2

但我得到了这个错误

第一个参数必须是字符串或编译模式


我会这样做:

  • 创建一个新列
    索引
    ,其中对于df2中的每个
    设备
    ,在df1中查找索引列表,其中df1.TagName包含
    设备

  • 通过使用
    stack()
    reset\u index()
    为每个项目创建一行,展平
    索引

  • 将展平df2与df1连接起来,以获得所需的所有信息
  • 输出:

              Line                          TagName EquipmentDescription  EquipmentNo
    187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL        Lifting table      1311256
    187879  PT_WOA      .ZS01_LA120_T05.SB._CBAbsHy        Lifting table      1311256
    187880  PT_WOA   .ZS01_LA120_T05.SB.S3110_CBAPV        Lifting table      1311256
    187878  PT_WOA   .ZS01_RB2202_T05.SB.S2385_FLOK           Roller bed      1311259
    187881  PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy           Roller bed      1311260
    

    我会这样做:

  • 创建一个新列
    索引
    ,其中对于df2中的每个
    设备
    ,在df1中查找索引列表,其中df1.TagName包含
    设备

  • 通过使用
    stack()
    reset\u index()
    为每个项目创建一行,展平
    索引

  • 将展平df2与df1连接起来,以获得所需的所有信息
  • 输出:

              Line                          TagName EquipmentDescription  EquipmentNo
    187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL        Lifting table      1311256
    187879  PT_WOA      .ZS01_LA120_T05.SB._CBAbsHy        Lifting table      1311256
    187880  PT_WOA   .ZS01_LA120_T05.SB.S3110_CBAPV        Lifting table      1311256
    187878  PT_WOA   .ZS01_RB2202_T05.SB.S2385_FLOK           Roller bed      1311259
    187881  PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy           Roller bed      1311260
    

    初始化提供的数据帧:

    import numpy as np
    import pandas as pd
    
    df1 = pd.DataFrame([['PT_WOA', '.ZS01_LA120_T05.SB.S2384_LesSwL', 10],
                        ['PT_WOA', '.ZS01_RB2202_T05.SB.S2385_FLOK', 10],
                        ['PT_WOA', '.ZS01_LA120_T05.SB._CBAbsHy', 10],
                        ['PT_WOA', '.ZS01_LA120_T05.SB.S3110_CBAPV', 10],
                        ['PT_WOA', '.ZS01_LARB2204.SB.S3111_CBRelHy', 10]],
                       columns = ['Line', 'TagName', 'CLASS'],
                       index = [187877, 187878, 187879, 187880, 187881])
    
    df2 = pd.DataFrame([[1311256, 'Lifting table', 'LA120'],
                        [1311257, 'Roller bed', 'RB2200'],
                        [1311258, 'Lifting table', 'LT2202'],
                        [1311259, 'Roller bed', 'RB2202'],
                        [1311260, 'Roller bed', 'RB2204']],
                      columns = ['EquipmentNo', 'EquipmentDescription', 'Equipment'])
    
    我建议如下:

    # create a copy of df1, dropping the 'CLASS' column
    df3 = df1.drop(columns=['CLASS'])
    
    # add the columns 'EquipmentDescription' and 'Equipment' filled with numpy NaN's
    df3['EquipmentDescription'] = np.nan
    df3['EquipmentNo'] = np.nan
    
    # for each row in df3, iterate over each row in df2
    for index_df3, row_df3 in df3.iterrows():
        for index_df2, row_df2 in df2.iterrows():
    
            # check if 'Equipment' is in 'TagName'
            if df2.loc[index_df2, 'Equipment'] in df3.loc[index_df3, 'TagName']:
    
                # set 'EquipmentDescription' and 'EquipmentNo'
                df3.loc[index_df3, 'EquipmentDescription'] = df2.loc[index_df2, 'EquipmentDescription']
                df3.loc[index_df3, 'EquipmentNo'] = df2.loc[index_df2, 'EquipmentNo']
    
    
    # conver the 'EquipmentNo' to type int
    df3['EquipmentNo'] = df3['EquipmentNo'].astype(int)
    
    
    这将产生以下数据帧:

            Line    TagName                         EquipmentDescription EquipmentNo
    187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table        1311256
    187878  PT_WOA  .ZS01_RB2202_T05.SB.S2385_FLOK  Roller bed           1311259
    187879  PT_WOA  .ZS01_LA120_T05.SB._CBAbsHy     Lifting table        1311256
    187880  PT_WOA  .ZS01_LA120_T05.SB.S3110_CBAPV  Lifting table        1311256
    187881  PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed           1311260
    

    让我知道这是否有帮助。

    初始化提供的数据帧:

    import numpy as np
    import pandas as pd
    
    df1 = pd.DataFrame([['PT_WOA', '.ZS01_LA120_T05.SB.S2384_LesSwL', 10],
                        ['PT_WOA', '.ZS01_RB2202_T05.SB.S2385_FLOK', 10],
                        ['PT_WOA', '.ZS01_LA120_T05.SB._CBAbsHy', 10],
                        ['PT_WOA', '.ZS01_LA120_T05.SB.S3110_CBAPV', 10],
                        ['PT_WOA', '.ZS01_LARB2204.SB.S3111_CBRelHy', 10]],
                       columns = ['Line', 'TagName', 'CLASS'],
                       index = [187877, 187878, 187879, 187880, 187881])
    
    df2 = pd.DataFrame([[1311256, 'Lifting table', 'LA120'],
                        [1311257, 'Roller bed', 'RB2200'],
                        [1311258, 'Lifting table', 'LT2202'],
                        [1311259, 'Roller bed', 'RB2202'],
                        [1311260, 'Roller bed', 'RB2204']],
                      columns = ['EquipmentNo', 'EquipmentDescription', 'Equipment'])
    
    我建议如下:

    # create a copy of df1, dropping the 'CLASS' column
    df3 = df1.drop(columns=['CLASS'])
    
    # add the columns 'EquipmentDescription' and 'Equipment' filled with numpy NaN's
    df3['EquipmentDescription'] = np.nan
    df3['EquipmentNo'] = np.nan
    
    # for each row in df3, iterate over each row in df2
    for index_df3, row_df3 in df3.iterrows():
        for index_df2, row_df2 in df2.iterrows():
    
            # check if 'Equipment' is in 'TagName'
            if df2.loc[index_df2, 'Equipment'] in df3.loc[index_df3, 'TagName']:
    
                # set 'EquipmentDescription' and 'EquipmentNo'
                df3.loc[index_df3, 'EquipmentDescription'] = df2.loc[index_df2, 'EquipmentDescription']
                df3.loc[index_df3, 'EquipmentNo'] = df2.loc[index_df2, 'EquipmentNo']
    
    
    # conver the 'EquipmentNo' to type int
    df3['EquipmentNo'] = df3['EquipmentNo'].astype(int)
    
    
    这将产生以下数据帧:

            Line    TagName                         EquipmentDescription EquipmentNo
    187877  PT_WOA  .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table        1311256
    187878  PT_WOA  .ZS01_RB2202_T05.SB.S2385_FLOK  Roller bed           1311259
    187879  PT_WOA  .ZS01_LA120_T05.SB._CBAbsHy     Lifting table        1311256
    187880  PT_WOA  .ZS01_LA120_T05.SB.S3110_CBAPV  Lifting table        1311256
    187881  PT_WOA  .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed           1311260
    
    如果有帮助,请告诉我。

    • 给定
      df1
      df2
      如下:
    df1
    |行|标记名|类|
    |---:|:-------|:--------------------------------|--------:|
    |0 | PT|u WOA | ZS01 | u LA120 | u T05.SB.S2384 | LesSwL | 10|
    |1 | PT|u WOA | ZS01 | u RB2202 | u T05.SB.S2385 | u FLOK | 10|
    |2 | PT|u WOA | ZS01 | u LA120 | u T05.SB | u CBAbsHy | 10|
    |3 | PT|u WOA | ZS01 | u LA120 | u T05.SB.S3110 | u CBAPV | 10|
    |4 | PT|u WOA | ZS01 | u LARB2204.SB.S3111 | u CBRelHy | 10|
    
    df2
    | |设备编号|设备说明|设备|
    |---:|--------------:|:-----------------------|:------------|
    |0 | 1311256 |升降台| LA120|
    |1 | 1311257 |辊道| RB2200|
    |2 | 1311258 |升降台| LT2202|
    |3 | 1311259 |辊道| RB2202|
    |4 | 1311260 |辊道| RB2204|
    
  • df2
  • device=df2.device.unique().tolist()
    
  • 通过在
    设备
  • df1['Equipment']=df1['TagName'].apply(lambda x:''.join([如果零件在x中,则零件在设备中])
    
  • 设备
    合并为最终形式
    • 如果您不想在
      df_final
      中使用
      设备
      列,请在下一行代码的末尾添加
      .drop(columns=['device'])
  • df_final=df1[['Line','TagName','Equipment']]。合并(df2,on='Equipment')
    
    df_final
    | |行|标记名|设备|设备号|设备说明|
    |---:|:-------|:--------------------------------|:------------|--------------:|:-----------------------|
    |0 | PT|u WOA | ZS01 | u LA120 | u T05.SB.S2384 | LesSwL | LA120 | 1311256 |升降台|
    |1 | PT|u WOA | ZS01 | u LA120 | u T05.SB.| CBAbsHy | LA120 | 1311256 |升降台|
    |2 | PT|U WOA | ZS01 | U LA120 | U T05.SB.S3110 | U CBAPV | LA120 | 1311256 |升降台|
    |3 | PT|U WOA | ZS01 | U RB2202 | U T05.SB.S2385 | U FLOK | RB2202 | 1311259 |辊道|
    |4 | PT|u WOA | ZS01 | u LARB2204.SB.S3111 | CBRelHy | RB2204 | 1311260 |辊道|
    
    • 给定
      df1
      df2
      如下:
    df1
    |行|标记名|类|
    |---:|:-------|:--------------------------------|--------:|
    |0 | PT|u WOA | ZS01 | u LA120 | u T05.SB.S2384 | LesSwL | 10|
    |1 | PT|u WOA | ZS01 | u RB2202 | u T05.SB.S2385 | u FLOK | 10|
    |2 | PT|u WOA | ZS01 | u LA120 | u T05.SB | u CBAbsHy | 10|
    |3 | PT|u WOA | ZS01 | u LA120 | u T05.SB.S3110 | u CBAPV | 10|
    |4 | PT|u WOA | ZS01 | u LARB2204.SB.S3111 | u CBRelHy | 10|
    
    df2
    | |设备编号|设备说明|设备|
    |---:|--------------:|:-----------------------|:------------|
    |0 | 1311256 |升降台| LA120|
    |1 | 1311257 |辊道| RB2200|
    |2 | 1311258 |升降台| LT2202|
    |3 | 1311259 |辊道| RB2202|
    |4 | 1311260 |辊道| RB2204|
    
  • df2
  • device=df2.device.unique().tolist()
    
  • 通过在
    设备
  • df1['Equipment']=df1['TagName'].apply(lambda x:''.join([如果零件在x中,则零件在设备中])
    
  • 设备
    合并为最终形式
    • 如果您不想在
      df_final
      中使用
      设备
      列,请在下一行代码的末尾添加
      .drop(columns=['device'])
  • df_-fin