Python 左连接产生的行数比左数据框多_Python_Pandas_Dataframe_Merge_Left Join

Python 左连接产生的行数比左数据框多

python pandas dataframe merge

Python 左连接产生的行数比左数据框多,python,pandas,dataframe,merge,left-join,Python,Pandas,Dataframe,Merge,Left Join,结果左连接中的行数比左数据帧中的行数多 # Importing Pandas and changing it's call to pd import numpy as np import pandas as pd SalesDF = pd.read_csv(r"C:\Users\USER\Documents\Reports\SalesForAnalysis.csv") print("This is the Sales shape") print(Sale

结果左连接中的行数比左数据帧中的行数多

# Importing Pandas and changing it's call to pd
import numpy as np
import pandas as pd

SalesDF = pd.read_csv(r"C:\Users\USER\Documents\Reports\SalesForAnalysis.csv")
print("This is the Sales shape")
print(SalesDF.shape)


CustInfoDF = pd.read_csv(r"C:\Users\USER\Documents\Cust.csv")

# This reassigns the df so that the rows with a NaN in the Account Number it  doesn't appear
CustInfoDF = CustInfoDF[CustInfoDF['Account Number'].notna()]


# Merges the two dataframes on SalesDF with "Cust Number" as the key
MergeDF = pd.merge(SalesDF, CustInfoDF, how="left", left_on="Cust Number", right_on="Account Number")

print("This is the Merge Shape ")
print(MergeDF.shape)

# Reduced the number of columns to the selected columns
CutDF = MergeDF[["Customer", "Invoice #", "E-mail Address", "Phone", "Clerk", "Total", "Date"]]

CutDF.drop_duplicates()

print("This is the Cut shape ")
print(CutDF.shape)

下面是运行程序后的结果

This is the Sales shape
(5347, 61)
This is the Merge Shape 
(6428, 83)
This is the Cut shape 
(6428, 7)

Process finished with exit code 0

CutDF最多只能有5347行。我有一个drop_duplicates方法，但是我仍然得到相同的结果

我看到了这个但我并没有看到解决这个问题的办法

任何帮助都将不胜感激。

在执行之前：

MergeDF = pd.merge(SalesDF, CustInfoDF, how="left", left_on="Cust Number", right_on="Account Number")

你能做到：

CustInfoDF = CustInfoDF.drop_duplicates(subset=["Account Number"])

我怀疑您的

custinformdf

每个

账号都有多个条目
如果这不起作用，您可以发布示例数据帧吗？只要代码是可复制的，就可以随意添加/替换虚拟值。
执行前：
MergeDF = pd.merge(SalesDF, CustInfoDF, how="left", left_on="Cust Number", right_on="Account Number")

你能做到：
CustInfoDF = CustInfoDF.drop_duplicates(subset=["Account Number"])

我怀疑您的custinformdf
每个账号都有多个条目
如果这不起作用，您可以发布示例数据帧吗？只要代码是可复制的，就可以随意添加/替换虚拟值。
什么是SalesDF['Cust Number'].duplicated（）.any（）
？恐怕我不理解这个问题。请你重新措辞好吗？什么是saledf['Cust Number'].duplicated（）.any（）
？恐怕我不明白这个问题。请你重新措辞好吗？你认为有多余的帐号是正确的。当我有时间的时候，我会更加认真地去实现这一点。在我的MergeDF形成之前，在添加CustInfoDF=CustInfoDF.drop_duplicates（on=[“Account Number”]）
之后，我的新的结果消息是，``这是销售形状（5104，62）回溯（最近的一次调用）：文件“C:\Users\USER\PycharmProjects\PandasStuff\PyMailReport.py”，第18行，在CustInfoDF=CustInfoDF.drop_duplicates（on=[“Account Number”]）类型错误：drop_duplicates（）得到一个意外的关键字参数'on'过程结束，退出代码为1``最后的评论谢谢你，问题已经解决。'on=“在drop_duplicates代码中没有必要，这就是抛出程序的原因，但现在它似乎工作得很好。非常感谢是的，很抱歉

上的

应该是子集
。只是更新一下，以防将来有人使用。你认为有多余的帐号是正确的。当我有时间的时候，我会更加认真地去实现这一点。在我的MergeDF形成之前，在添加CustInfoDF=CustInfoDF.drop_duplicates（on=[“Account Number”]）
之后，我的新的结果消息是，``这是销售形状（5104，62）回溯（最近的一次调用）：文件“C:\Users\USER\PycharmProjects\PandasStuff\PyMailReport.py”，第18行，在CustInfoDF=CustInfoDF.drop_duplicates（on=[“Account Number”]）类型错误：drop_duplicates（）得到一个意外的关键字参数'on'过程结束，退出代码为1``最后的评论谢谢你，问题已经解决。'on=“在drop_duplicates代码中没有必要，这就是抛出程序的原因，但现在它似乎工作得很好。非常感谢是的，很抱歉

上的

应该是子集
。只是更新，以防人们将来使用。