如何在Python的数据帧中找到相同的值并在单独的列中标记它们?

如何在Python的数据帧中找到相同的值并在单独的列中标记它们?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一组列,我想连接它们,然后在concat列中找到其中有多少是相同的。我写了一些代码,但我的数据帧太大,完成这个练习需要太长时间 这就是我所做的 import pandas as pd l = [[1,'a','b','c','d'],[2,'a','c','c','d'],[3,'a','c','c','d'],[4,'a','b','b','d'],[5,'a','c','c','d']] df = pd.DataFrame(l,columns = ['Serial No','one',

我有一组列,我想连接它们,然后在concat列中找到其中有多少是相同的。我写了一些代码,但我的数据帧太大,完成这个练习需要太长时间

这就是我所做的

import pandas as pd

l = [[1,'a','b','c','d'],[2,'a','c','c','d'],[3,'a','c','c','d'],[4,'a','b','b','d'],[5,'a','c','c','d']]
df = pd.DataFrame(l,columns = ['Serial No','one','two','three','four'])


df["Conc"] = df["one"] + "____" + df["three"] + "____" + df["four"]
df['Yes/No'] = ""


for i in range(0, df.shape[0]):
    for j in range(0, df.shape[0]):
        if (i != j):
            if (df.iloc[i,df.shape[1]-2] == df.iloc[j,df.shape[1]-2]):
                df.iloc[i,df.shape[1]-1] = "yes"

这在较小的数据帧上有效,但在较大的数据帧上需要较长的时间。有没有更有效的方法来产生相同的结果?

您可以使用广播规则来避免一个循环

import pandas as pd

l = [[1,'a','b','c','d'],[2,'a','c','c','d'],[3,'a','c','c','d'],[4,'a','b','b','d'],[5,'a','c','c','d']]
df = pd.DataFrame(l,columns = ['Serial No','one','two','three','four'])


df["Conc"] = df["one"] + "____" + df["three"] + "____" + df["four"]
df['Yes/No'] = ""

for i in range(0, df.shape[0]):
  any_eq = df.iloc[i, -2] == df.Conc
  df.iloc[i, -1] = 'yes' if any_eq.any() else 'no'

我认为这是一个更快的解决方法

import pandas as pd

l = [[1, 'a', 'b', 'c', 'd'], [2, 'a', 'c', 'c', 'd'], [3, 'a', 'c', 'c', 'd'], [4, 'a', 'b', 'b', 'd'],
     [5, 'a', 'c', 'c', 'd']]
df = pd.DataFrame(l, columns=['Serial No', 'one', 'two', 'three', 'four'])

df["Conc"] = df["one"] + "____" + df["three"] + "____" + df["four"]
df['Yes/No'] = ""

df['Yes/No'] = df.duplicated(["Conc"], keep=False)
df = df.replace({'Yes/No': {True: "Yes", False: "No"}})

你能举一个同样的预期输出的例子,以什么作为参考吗?