Python 检查同一行的多个条件_Python_Python 3.x_Pandas_Dataframe

Python 检查同一行的多个条件

python python-3.x pandas dataframe

Python 检查同一行的多个条件,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我必须比较两个不同的来源，找出所有id Source\u excel表格 +-----+-------------+------+----------+ | id | name | City | flag | +-----+-------------+------+----------+ | 101 | Plate | NY | Ready | | 102 | Back washer | NY | Sold | | 103 | Ring

我必须比较两个不同的来源，找出所有

id

Source\u excel

表格

+-----+-------------+------+----------+
| id  | name        | City | flag     |
+-----+-------------+------+----------+
| 101 | Plate       | NY   | Ready    |
| 102 | Back washer | NY   | Sold     |
| 103 | Ring        | MC   | Planning |
| 104 | Glass       | NMC  | Ready    |
| 107 | Cover       | PR   | Ready    |
+-----+-------------+------+----------+

+-----+----------+------+----------+
| id  | name     | City | flag     |
+-----+----------+------+----------+
| 101 | Plate    | NY   | Planning |
| 102 | Nut      | TN   | Expired  |
| 103 | Ring     | MC   | Planning |
| 104 | Top Wire | NY   | Ready    |
| 105 | Bolt     | MC   | Expired  |
+-----+----------+------+----------+

Source\u dw

表格

+-----+-------------+------+----------+
| id  | name        | City | flag     |
+-----+-------------+------+----------+
| 101 | Plate       | NY   | Ready    |
| 102 | Back washer | NY   | Sold     |
| 103 | Ring        | MC   | Planning |
| 104 | Glass       | NMC  | Ready    |
| 107 | Cover       | PR   | Ready    |
+-----+-------------+------+----------+

+-----+----------+------+----------+
| id  | name     | City | flag     |
+-----+----------+------+----------+
| 101 | Plate    | NY   | Planning |
| 102 | Nut      | TN   | Expired  |
| 103 | Ring     | MC   | Planning |
| 104 | Top Wire | NY   | Ready    |
| 105 | Bolt     | MC   | Expired  |
+-----+----------+------+----------+

预期结果

+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID  | excel_name  | dw_name  | excel_flag | dw_flag  | excel_city | dw_city | RESULT           |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate       | Plate    | Ready      | Planning | NY         | NY      | FLAG_MISMATCH    |
| 102 | Back washer | Nut      | Sold       | Expired  | NY         | TN      | NAME_MISMATCH    |
| 102 | Back washer | Nut      | Sold       | Expired  | NY         | TN      | FLAG_MISMATCH    |
| 102 | Back washer | Nut      | Sold       | Expired  | NY         | TN      | CITY_MISMATCH    |
| 103 | Ring        | Ring     | Planning   | Planning | MC         | MC      | ALL_MATCH        |
| 104 | Glass       | Top Wire | Ready      | Ready    | NMC        | NY      | NAME_MISMATCH    |
| 104 | Glass       | Top Wire | Ready      | Ready    | NMC        | NY      | CITY_MISMATCH    |
| 107 | Cover       |          | Ready      |          | PR         |         | MISSING IN DW    |
| 105 |             | Bolt     |            | Expired  |            | MC      | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+

我是python新手，我尝试了下面的查询，但没有给出预期的结果

import pandas as pd

source_excel = pd.read_csv('C:/Mypython/Newyork/excel.csv',encoding = "ISO-8859-1")
source_dw = pd.read_csv('C:/Mypython/Newyork/dw.csv',encoding = "ISO-8859-1")
comparison_result = pd.merge(source_excel,source_dw,on='ID',how='outer',indicator=True)

 
comparison_result.loc[(comparison_result['_merge'] == 'both') & (name_x != name_y), 'Result'] = 'NAME_MISMATCH' 
comparison_result.loc[(comparison_result['_merge'] == 'both') & (city_x != city_y), 'Result'] = 'CITY_MISMATCH'
comparison_result.loc[(comparison_result['_merge'] == 'both') & (flag_x != flag_y), 'Result'] = 'FLAG_MISMATCH'     
comparison_result.loc[comparison_result['_merge'] == 'left_only', 'Result'] = 'Missing in dw'  
comparison_result.loc[comparison_result['_merge'] == 'right_only', 'Result'] = 'Missing in excel'  
comparison_result.loc[comparison_result['_merge'] == 'both', 'Result'] = 'ALL_Match' 

csv_column = comparison_result[['ID','name_x','name_y','city_x','city_y','flag_x','flag_y','Result']]
print(csv_column)

是否有其他方法可以检查所有情况并在单独的行中报告每个情况。如果不可能单独行，至少我需要在同一列中用所有不匹配项分隔。类似于

FLAG\u不匹配，CITY\u不匹配

您可以执行以下操作：

df = pd.merge(Source_excel, Source_dw, on = 'ID', how = 'left', suffixes = (None, '_dw'))

这将创建一个新的数据帧，就像您想要的那样，尽管您必须根据需要对列重新排序。请注意，“\u dw”在本例中是后缀而不是前缀

通过使用以下代码，可以根据需要对列重新排序：

#Complement with the order you want
df = df[['ID', 'excel_name']]

对于结果列，我认为您必须为要检查的每个条件创建一列（至少我知道如何这样做）。下面是一个例子：

#This will return 1 if there's a match and 0 otherwise
df['result_flag'] = df.apply(lambda x: 1 if x.excel_flag == x.flag_dw else 0, axis = 1)

你可以做：

df = pd.merge(Source_excel, Source_dw, on = 'ID', how = 'left', suffixes = (None, '_dw'))

这将创建一个新的数据帧，就像您想要的那样，尽管您必须根据需要对列重新排序。请注意，“\u dw”在本例中是后缀而不是前缀

通过使用以下代码，可以根据需要对列重新排序：

#Complement with the order you want
df = df[['ID', 'excel_name']]

对于结果列，我认为您必须为要检查的每个条件创建一列（至少我知道如何这样做）。下面是一个例子：

#This will return 1 if there's a match and 0 otherwise
df['result_flag'] = df.apply(lambda x: 1 if x.excel_flag == x.flag_dw else 0, axis = 1)

以下是评分的方法：

df['result'] = 0

# repeated mask / df.loc statements suggests a loop, over a list of tuples
mask = df['excel_flag'] != df['df_flag']
df.loc[mask, 'result'] += 1

mask = df['excel_name'] != df['dw_name']
df.loc[mask, 'result'] += 10

df['result'] = df['result'].map({ 0: 'all match',
                                  1: 'flag mismatch',
                                 10: 'name mismatch',
                                 11: 'all mismatch',})

以下是评分的方法：

df['result'] = 0

# repeated mask / df.loc statements suggests a loop, over a list of tuples
mask = df['excel_flag'] != df['df_flag']
df.loc[mask, 'result'] += 1

mask = df['excel_name'] != df['dw_name']
df.loc[mask, 'result'] += 10

df['result'] = df['result'].map({ 0: 'all match',
                                  1: 'flag mismatch',
                                 10: 'name mismatch',
                                 11: 'all mismatch',})