Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 检查同一行的多个条件_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 检查同一行的多个条件

Python 检查同一行的多个条件,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我必须比较两个不同的来源,找出所有id Source\u excel表格 +-----+-------------+------+----------+ | id | name | City | flag | +-----+-------------+------+----------+ | 101 | Plate | NY | Ready | | 102 | Back washer | NY | Sold | | 103 | Ring

我必须比较两个不同的来源,找出所有
id

Source\u excel
表格

+-----+-------------+------+----------+
| id  | name        | City | flag     |
+-----+-------------+------+----------+
| 101 | Plate       | NY   | Ready    |
| 102 | Back washer | NY   | Sold     |
| 103 | Ring        | MC   | Planning |
| 104 | Glass       | NMC  | Ready    |
| 107 | Cover       | PR   | Ready    |
+-----+-------------+------+----------+
+-----+----------+------+----------+
| id  | name     | City | flag     |
+-----+----------+------+----------+
| 101 | Plate    | NY   | Planning |
| 102 | Nut      | TN   | Expired  |
| 103 | Ring     | MC   | Planning |
| 104 | Top Wire | NY   | Ready    |
| 105 | Bolt     | MC   | Expired  |
+-----+----------+------+----------+
Source\u dw
表格

+-----+-------------+------+----------+
| id  | name        | City | flag     |
+-----+-------------+------+----------+
| 101 | Plate       | NY   | Ready    |
| 102 | Back washer | NY   | Sold     |
| 103 | Ring        | MC   | Planning |
| 104 | Glass       | NMC  | Ready    |
| 107 | Cover       | PR   | Ready    |
+-----+-------------+------+----------+
+-----+----------+------+----------+
| id  | name     | City | flag     |
+-----+----------+------+----------+
| 101 | Plate    | NY   | Planning |
| 102 | Nut      | TN   | Expired  |
| 103 | Ring     | MC   | Planning |
| 104 | Top Wire | NY   | Ready    |
| 105 | Bolt     | MC   | Expired  |
+-----+----------+------+----------+
预期结果

+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID  | excel_name  | dw_name  | excel_flag | dw_flag  | excel_city | dw_city | RESULT           |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate       | Plate    | Ready      | Planning | NY         | NY      | FLAG_MISMATCH    |
| 102 | Back washer | Nut      | Sold       | Expired  | NY         | TN      | NAME_MISMATCH    |
| 102 | Back washer | Nut      | Sold       | Expired  | NY         | TN      | FLAG_MISMATCH    |
| 102 | Back washer | Nut      | Sold       | Expired  | NY         | TN      | CITY_MISMATCH    |
| 103 | Ring        | Ring     | Planning   | Planning | MC         | MC      | ALL_MATCH        |
| 104 | Glass       | Top Wire | Ready      | Ready    | NMC        | NY      | NAME_MISMATCH    |
| 104 | Glass       | Top Wire | Ready      | Ready    | NMC        | NY      | CITY_MISMATCH    |
| 107 | Cover       |          | Ready      |          | PR         |         | MISSING IN DW    |
| 105 |             | Bolt     |            | Expired  |            | MC      | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
我是python新手,我尝试了下面的查询,但没有给出预期的结果

import pandas as pd

source_excel = pd.read_csv('C:/Mypython/Newyork/excel.csv',encoding = "ISO-8859-1")
source_dw = pd.read_csv('C:/Mypython/Newyork/dw.csv',encoding = "ISO-8859-1")
comparison_result = pd.merge(source_excel,source_dw,on='ID',how='outer',indicator=True)

 
comparison_result.loc[(comparison_result['_merge'] == 'both') & (name_x != name_y), 'Result'] = 'NAME_MISMATCH' 
comparison_result.loc[(comparison_result['_merge'] == 'both') & (city_x != city_y), 'Result'] = 'CITY_MISMATCH'
comparison_result.loc[(comparison_result['_merge'] == 'both') & (flag_x != flag_y), 'Result'] = 'FLAG_MISMATCH'     
comparison_result.loc[comparison_result['_merge'] == 'left_only', 'Result'] = 'Missing in dw'  
comparison_result.loc[comparison_result['_merge'] == 'right_only', 'Result'] = 'Missing in excel'  
comparison_result.loc[comparison_result['_merge'] == 'both', 'Result'] = 'ALL_Match' 

csv_column = comparison_result[['ID','name_x','name_y','city_x','city_y','flag_x','flag_y','Result']]
print(csv_column)
是否有其他方法可以检查所有情况并在单独的行中报告每个情况。如果不可能单独行,至少我需要在同一列中用所有不匹配项分隔。类似于
FLAG\u不匹配,CITY\u不匹配

您可以执行以下操作:

df = pd.merge(Source_excel, Source_dw, on = 'ID', how = 'left', suffixes = (None, '_dw'))
这将创建一个新的数据帧,就像您想要的那样,尽管您必须根据需要对列重新排序。请注意,“\u dw”在本例中是后缀而不是前缀

通过使用以下代码,可以根据需要对列重新排序:

#Complement with the order you want
df = df[['ID', 'excel_name']]
对于结果列,我认为您必须为要检查的每个条件创建一列(至少我知道如何这样做)。下面是一个例子:

#This will return 1 if there's a match and 0 otherwise
df['result_flag'] = df.apply(lambda x: 1 if x.excel_flag == x.flag_dw else 0, axis = 1)
你可以做:

df = pd.merge(Source_excel, Source_dw, on = 'ID', how = 'left', suffixes = (None, '_dw'))
这将创建一个新的数据帧,就像您想要的那样,尽管您必须根据需要对列重新排序。请注意,“\u dw”在本例中是后缀而不是前缀

通过使用以下代码,可以根据需要对列重新排序:

#Complement with the order you want
df = df[['ID', 'excel_name']]
对于结果列,我认为您必须为要检查的每个条件创建一列(至少我知道如何这样做)。下面是一个例子:

#This will return 1 if there's a match and 0 otherwise
df['result_flag'] = df.apply(lambda x: 1 if x.excel_flag == x.flag_dw else 0, axis = 1)

以下是评分的方法:

df['result'] = 0

# repeated mask / df.loc statements suggests a loop, over a list of tuples
mask = df['excel_flag'] != df['df_flag']
df.loc[mask, 'result'] += 1

mask = df['excel_name'] != df['dw_name']
df.loc[mask, 'result'] += 10

df['result'] = df['result'].map({ 0: 'all match',
                                  1: 'flag mismatch',
                                 10: 'name mismatch',
                                 11: 'all mismatch',})


以下是评分的方法:

df['result'] = 0

# repeated mask / df.loc statements suggests a loop, over a list of tuples
mask = df['excel_flag'] != df['df_flag']
df.loc[mask, 'result'] += 1

mask = df['excel_name'] != df['dw_name']
df.loc[mask, 'result'] += 10

df['result'] = df['result'].map({ 0: 'all match',
                                  1: 'flag mismatch',
                                 10: 'name mismatch',
                                 11: 'all mismatch',})