Python 3.x 如何比较Dataframes中的列以查找出现在不同列中的所有条目?
充分披露。我对蟒蛇还相当陌生,今天发现了熊猫 我从两个csv文件创建了一个数据帧,一个是机器人扫描条形码ID的结果,另一个是机器人要执行的指令列表Python 3.x 如何比较Dataframes中的列以查找出现在不同列中的所有条目?,python-3.x,pandas,csv,dataframe,Python 3.x,Pandas,Csv,Dataframe,充分披露。我对蟒蛇还相当陌生,今天发现了熊猫 我从两个csv文件创建了一个数据帧,一个是机器人扫描条形码ID的结果,另一个是机器人要执行的指令列表 import pandas as pd #import csv file and read the column containing plate IDs scanned by Robot scancsvdata = pd.read_csv("G:\scan.csv", header=None, sep=';', skiprows=(1),useco
import pandas as pd
#import csv file and read the column containing plate IDs scanned by Robot
scancsvdata = pd.read_csv("G:\scan.csv", header=None, sep=';', skiprows=(1),usecols=[6])
#Rename Column to Plates Scanned
scancsvdata.columns = ["IDs Scanned"]
#Remove any Duplicate Plate IDs
scancsvdataunique = scancsvdata.drop_duplicates()
#import the Worklist to be executed CSV file and read the Source Column to find required Plates
worklistdataSrceID = pd.read_csv("G:\TestWorklist.CSV", usecols=["SrceID"])
#Rename SrceID Column to Plates Required
worklistdataSrceID.rename(columns={'SrceID':'IDs required'}, inplace=True)
#remove duplicates from Plates Required
worklistdataSrceIDunique = worklistdataSrceID.drop_duplicates()
#import the Worklist to be executed CSV file and read the Destination Column to find required Plates
worklistdataDestID = pd.read_csv("G:\TestWorklist.CSV", usecols=["DestID"])
#Rename DestID Column to Plates Required
worklistdataDestID.rename(columns={'DestID':'IDs required'}, inplace=True)
#remove duplicates from Plates Required
worklistdataDestIDunique = worklistdataDestID.drop_duplicates()
#Combine into one Dataframe
AllData = pd.concat ([scancsvdataunique, worklistdataSrceIDunique, worklistdataDestIDunique], sort=True)
print (AllData)
结果数据框列出了第1列中扫描的ID和第2列中所需的ID
IDs Scanned IDs required
0 1024800.0 NaN
1 1024838.0 NaN
2 1024839.0 NaN
3 1024841.0 NaN
4 1024844.0 NaN
5 1024798.0 NaN
6 1024858.0 NaN
7 1024812.0 NaN
8 1024797.0 NaN
9 1024843.0 NaN
10 1024840.0 NaN
11 1024842.0 NaN
12 1024755.0 NaN
13 1024809.0 NaN
14 1024810.0 NaN
15 8656.0 NaN
16 8657.0 NaN
17 8658.0 NaN
0 NaN 1024800.0
33 NaN 1024843.0
0 NaN 8656.0
7 NaN 8657.0
15 NaN 8658.0
我如何确保“IDs Required”列中的所有ID都出现在“IDs Scanned”列中
理想情况下,上述比较的结果将是一条通用消息,如“找到所有ID”
如果使用不同的csv文件,数据帧如下所示
IDs Scanned IDs required
0 1024800.0 NaN
1 1024838.0 NaN
2 1024839.0 NaN
3 1024841.0 NaN
4 1024844.0 NaN
5 1024798.0 NaN
6 1024858.0 NaN
7 1024812.0 NaN
8 1024797.0 NaN
9 1024843.0 NaN
10 1024840.0 NaN
11 1024842.0 NaN
12 1024755.0 NaN
13 1024809.0 NaN
14 1024810.0 NaN
15 8656.0 NaN
16 8657.0 NaN
17 8658.0 NaN
0 NaN 2024800.0
33 NaN 2024843.0
0 NaN 8656.0
7 NaN 8657.0
15 NaN 8658.0
然后,比较的结果将是缺少的ID 2024800和2024843的列表。如果所需的所有项目都在列中,则检查True/False
all([df中的项目[“IDs Scanned”]对于df中的项目[“IDs required”]。unique()])
要获取唯一缺失项的列表,请执行以下操作:
sorted(set(df[“IDs required”])-set(df[“IDs Scanned”])
或者使用pandas语法返回一个数据帧,该数据帧被过滤到在扫描的标识中找不到所需的标识的行:
df.loc[~df[“IDs required”].isin(df[“IDs Scanned”])
all([df中的项目[“IDs Scanned”]对于df中的项目[“IDs required”].unique()])
您可以使用.isin
至少添加一个简短的代码解释会有所帮助。这太棒了。我只需要解决如何将其写入txt文件而不获取,参数必须是str,而不是列表错误。这些都非常有用,我正在学习很多,但我仍然有一些问题。结果总是返回False,即使它应该是true,我假设是因为所有的Nan值。有没有办法修正这一点?我尝试将所有nan值都设置为0,但没有效果。唯一缺失项的列表非常完美,我只需要解决如何将其放入文本文件中,而不会遇到“参数必须是str,而不是list”错误如果缺失项列表有效,那么您可以使用bool(排序(set(df[“IDs required”])-set(df[“IDs Scanned”])将其转换为真/假
-如果列表中有项目,则返回True;如果列表为空,则返回False。要将列表l
作为字符串连接,请使用“,”。连接(l)
missing_ids = df.loc[~df['IDs required'].isin(df['IDs Scanned']), 'IDs required']