Pandas 如何根据需要使用两个DF1列在DF2上创建新列_Pandas_Dataframe_Compare_Filtering

Pandas 如何根据需要使用两个DF1列在DF2上创建新列

pandas dataframe

Pandas 如何根据需要使用两个DF1列在DF2上创建新列,pandas,dataframe,compare,filtering,Pandas,Dataframe,Compare,Filtering,我有两个具有不同数据的数据帧，我需要根据在DF1的两列中获得的信息在DF2上添加一个新列。在下面的示例中，我需要检查在DF中具有相同城市和DOB值的所有条目，并在DF1中添加一个新列，表示是或否 DF1: City DOB Gender Test NYC 01/05/1990 F Positive NYC 01/06/1991 M Negative LA 12/01/1980 F

我有两个具有不同数据的数据帧，我需要根据在DF1的两列中获得的信息在DF2上添加一个新列。在下面的示例中，我需要检查在DF中具有相同城市和DOB值的所有条目，并在DF1中添加一个新列，表示是或否

DF1:
City      DOB        Gender      Test
NYC   01/05/1990       F        Positive
NYC   01/06/1991       M        Negative
LA    12/01/1980       F        Negative
BOS   11/11/1987       M        Positive

DF2:
City      DOB        Gender
NYC   01/05/1990       F        
NYC   04/22/1980       M        
LA    12/01/1980       F        
BOS   07/18/1984       M

因此，我的输出是：

DF1'
City      DOB        Gender      Test        New_column
NYC   01/05/1990       F        Positive        YES
NYC   01/06/1991       M        Negative        NO
LA    12/01/1980       F        Negative        YES
BOS   11/11/1987       M        Positive        NO

我得到的最接近的结果是使用下面的代码，但是它只在一个DF1列中搜索，而不在另一列中搜索（在我的例子中，它在具有相同DOB的所有条目中添加值YES）

有什么办法可以用熊猫来做吗？我有一个非常大的数据集，这段代码将为我节省一些时间。

您可以使用

numpy.where

import numpy as np

rule = (df1["City"] == df2["City"]) & (df1["DOB"] == df2["DOB"])
df1["new_column"] = np.where(rule, "YES", "NO")

print(df1)
  City         DOB Gender      Test new_column
0  NYC  01/05/1990      F  Positive        YES
1  NYC  01/06/1991      M  Negative         NO
2   LA  12/01/1980      F  Negative        YES
3  BOS  11/11/1987      M  Positive         NO

您可以使用

numpy.where

import numpy as np

rule = (df1["City"] == df2["City"]) & (df1["DOB"] == df2["DOB"])
df1["new_column"] = np.where(rule, "YES", "NO")

print(df1)
  City         DOB Gender      Test new_column
0  NYC  01/05/1990      F  Positive        YES
1  NYC  01/06/1991      M  Negative         NO
2   LA  12/01/1980      F  Negative        YES
3  BOS  11/11/1987      M  Positive         NO

不确定数据有多大，或存在相关限制，以及使用以下内容的解决方案：

df3=(
df2.设置索引（[“城市”、“DOB”]）
.加入(
df1.设置索引（[“城市”、“DOB”]）
.drop（“性别”，axis=“列”）
.分配(
新列=列表(
itertools.islice（itertools.cycle（[“是”、“否”），df1.shape[0]）
)
)
)
.reset_index（）
)

哪些产出：

  City         DOB Gender      Test new_column
0  NYC  01/05/1990      F  Positive        yes
1  NYC  04/22/1980      M       NaN        NaN
2   LA  12/01/1980      F  Negative        yes
3  BOS  07/18/1984      M       NaN        NaN

不确定数据有多大，或存在相关限制，以及使用以下内容的解决方案：

df3=(
df2.设置索引（[“城市”、“DOB”]）
.加入(
df1.设置索引（[“城市”、“DOB”]）
.drop（“性别”，axis=“列”）
.分配(
新列=列表(
itertools.islice（itertools.cycle（[“是”、“否”），df1.shape[0]）
)
)
)
.reset_index（）
)

哪些产出：

  City         DOB Gender      Test new_column
0  NYC  01/05/1990      F  Positive        yes
1  NYC  04/22/1980      M       NaN        NaN
2   LA  12/01/1980      F  Negative        yes
3  BOS  07/18/1984      M       NaN        NaN

您可以从中获得一些乐趣，直接比较这两列的dataframe的子集，并将

布尔值的和取为一个条件（如果它等于2）。然后，将布尔值替换为Yes
或No
。这种方法假设值的顺序和长度相同
DF1['New_column'] = ((DF1[['City', 'DOB']] == DF2[['City', 'DOB']]).sum(axis=1) == 2).replace([True,False], ['YES', 'NO'])

    City        DOB Gender      Test    New_column
0   NYC  01/05/1990      F  Positive           YES
1   NYC  01/06/1991      M  Negative            NO
2   LA   12/01/1980      F  Negative           YES
3   BOS  11/11/1987      M  Positive            NO

您可以从中获得一些乐趣，直接比较这两列的dataframe的子集，并将布尔值的和取为一个条件（如果它等于2）。然后，将布尔值替换为Yes
或No
。这种方法假设值的顺序和长度相同
DF1['New_column'] = ((DF1[['City', 'DOB']] == DF2[['City', 'DOB']]).sum(axis=1) == 2).replace([True,False], ['YES', 'NO'])

    City        DOB Gender      Test    New_column
0   NYC  01/05/1990      F  Positive           YES
1   NYC  01/06/1991      M  Negative            NO
2   LA   12/01/1980      F  Negative           YES
3   BOS  11/11/1987      M  Positive            NO

我想你应该在输出中也包含测试信息谢谢！我还注意到OP在其输出中更改了df1，而不是df2。我认为您应该在输出中也包含测试
信息谢谢！我还注意到OP在其输出中改变了df1，而不是df2。不幸的是，DFs的顺序不同-我认为不可能使用这种方法。但无论如何，谢谢你。不幸的是，DFs的顺序不同-我认为不可能使用这种方法。不过还是谢谢你。