Python 在另一个数据框中的一个数据框中搜索和查找搜索值,并基于数据框中的查找值填充新列
我有两个数据帧——df1和df2,如下所示。我需要在df1的所有列(列-a到f)中搜索来自df2['Pid']的值,然后创建一个新的列df1['ind'],它将保存来自df2['ind']的值,只要在df1中找到df2['Pid']的值之间的匹配。对我来说,它看起来像一个扩展的查找案例。我使用Python 在另一个数据框中的一个数据框中搜索和查找搜索值,并基于数据框中的查找值填充新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个数据帧——df1和df2,如下所示。我需要在df1的所有列(列-a到f)中搜索来自df2['Pid']的值,然后创建一个新的列df1['ind'],它将保存来自df2['ind']的值,只要在df1中找到df2['Pid']的值之间的匹配。对我来说,它看起来像一个扩展的查找案例。我使用df2.isin(df1['PERSON\u UID'])在df1中查找并标记value found=true/false,但在创建df1['ind']列时遇到了问题 df1: df2: 所需op: a
df2.isin(df1['PERSON\u UID'])
在df1中查找并标记value found=true/false,但在创建df1['ind']列时遇到了问题
df1:
df2:
所需op:
a b c d e f ind
0 0 2106 0 0 0 n
0 2103 0 0 0 0 n
0 2104 0 0 0 0 y
0 2105 0 0 0 0 n
2100 0 0 0 0 0 y
2101 0 0 0 0 0 n
2102 0 0 0 0 0 y
0 0 2107 0 0 0 n
0 0 2108 0 0 0 y
0 0 2109 0 0 0 y
0 0 2110 0 0 0 n
0 0 2111 0 0 0 y
0 0 0 2112. 0 0 y
0 0 0 2113 0 0 y
0 0 0 2114 0 0 n
0 0 0 0 2115 0 n
0 0 0 0 2116 0 y
0 0 0 0 0 2117 y
0 0 0 0 0 2118 n
0 0 0 0 0 2119 y
0 0 0 0 2120 0 n
使用:
详细信息:
首先将0
值替换为缺少的值,方法是:
然后向前填充缺少的值:
print (df1.mask(df1.eq(0)).ffill(axis=1))
a b c d e f
0 NaN NaN 2106.0 2106.0 2106.0 2106.0
1 NaN 2103.0 2103.0 2103.0 2103.0 2103.0
2 NaN 2104.0 2104.0 2104.0 2104.0 2104.0
3 NaN 2105.0 2105.0 2105.0 2105.0 2105.0
4 2100.0 2100.0 2100.0 2100.0 2100.0 2100.0
5 2101.0 2101.0 2101.0 2101.0 2101.0 2101.0
6 2102.0 2102.0 2102.0 2102.0 2102.0 2102.0
7 NaN NaN 2107.0 2107.0 2107.0 2107.0
8 NaN NaN 2108.0 2108.0 2108.0 2108.0
9 NaN NaN 2109.0 2109.0 2109.0 2109.0
10 NaN NaN 2110.0 2110.0 2110.0 2110.0
11 NaN NaN 2111.0 2111.0 2111.0 2111.0
12 NaN NaN NaN 2112.0 2112.0 2112.0
13 NaN NaN NaN 2113.0 2113.0 2113.0
14 NaN NaN NaN 2114.0 2114.0 2114.0
15 NaN NaN NaN NaN 2115.0 2115.0
16 NaN NaN NaN NaN 2116.0 2116.0
17 NaN NaN NaN NaN NaN 2117.0
18 NaN NaN NaN NaN NaN 2118.0
19 NaN NaN NaN NaN NaN 2119.0
20 NaN NaN NaN NaN 2120.0 2120.0
使用以下选项按位置选择最后一列:
最后一次使用。使用:
详细信息:
首先将0
值替换为缺少的值,方法是:
然后向前填充缺少的值:
print (df1.mask(df1.eq(0)).ffill(axis=1))
a b c d e f
0 NaN NaN 2106.0 2106.0 2106.0 2106.0
1 NaN 2103.0 2103.0 2103.0 2103.0 2103.0
2 NaN 2104.0 2104.0 2104.0 2104.0 2104.0
3 NaN 2105.0 2105.0 2105.0 2105.0 2105.0
4 2100.0 2100.0 2100.0 2100.0 2100.0 2100.0
5 2101.0 2101.0 2101.0 2101.0 2101.0 2101.0
6 2102.0 2102.0 2102.0 2102.0 2102.0 2102.0
7 NaN NaN 2107.0 2107.0 2107.0 2107.0
8 NaN NaN 2108.0 2108.0 2108.0 2108.0
9 NaN NaN 2109.0 2109.0 2109.0 2109.0
10 NaN NaN 2110.0 2110.0 2110.0 2110.0
11 NaN NaN 2111.0 2111.0 2111.0 2111.0
12 NaN NaN NaN 2112.0 2112.0 2112.0
13 NaN NaN NaN 2113.0 2113.0 2113.0
14 NaN NaN NaN 2114.0 2114.0 2114.0
15 NaN NaN NaN NaN 2115.0 2115.0
16 NaN NaN NaN NaN 2116.0 2116.0
17 NaN NaN NaN NaN NaN 2117.0
18 NaN NaN NaN NaN NaN 2118.0
19 NaN NaN NaN NaN NaN 2119.0
20 NaN NaN NaN NaN 2120.0 2120.0
使用以下选项按位置选择最后一列:
最后一次使用。@jezrael的答案是完美的,如果Pid不是重复的,那么您需要我想将它们合并为索引的总和
df['Pid'] = df.sum(axis=1)
df['Pid'] = df['Pid'].astype(int)
df = pd.merge(df, df2, on='Pid', how='inner')
df.drop('Pid', axis=1, inplace=True)
df
a b c d e f ind
0 0 0 2106 0.0 0 0 n
1 0 2103 0 0.0 0 0 n
2 0 2104 0 0.0 0 0 y
3 0 2105 0 0.0 0 0 n
4 2100 0 0 0.0 0 0 y
5 2101 0 0 0.0 0 0 n
6 2102 0 0 0.0 0 0 y
7 0 0 2107 0.0 0 0 n
8 0 0 2108 0.0 0 0 y
9 0 0 2109 0.0 0 0 y
10 0 0 2110 0.0 0 0 n
11 0 0 2111 0.0 0 0 y
12 0 0 0 2112.0 0 0 y
13 0 0 0 2113.0 0 0 y
14 0 0 0 2114.0 0 0 n
15 0 0 0 0.0 2115 0 n
16 0 0 0 0.0 2116 0 y
17 0 0 0 0.0 0 2117 y
18 0 0 0 0.0 0 2118 n
19 0 0 0 0.0 0 2119 y
20 0 0 0 0.0 2120 0 n
@jezrael的答案是完美的,如果Pid不是重复的,那么您需要我所考虑的总和,将它们组合为一个索引
df['Pid'] = df.sum(axis=1)
df['Pid'] = df['Pid'].astype(int)
df = pd.merge(df, df2, on='Pid', how='inner')
df.drop('Pid', axis=1, inplace=True)
df
a b c d e f ind
0 0 0 2106 0.0 0 0 n
1 0 2103 0 0.0 0 0 n
2 0 2104 0 0.0 0 0 y
3 0 2105 0 0.0 0 0 n
4 2100 0 0 0.0 0 0 y
5 2101 0 0 0.0 0 0 n
6 2102 0 0 0.0 0 0 y
7 0 0 2107 0.0 0 0 n
8 0 0 2108 0.0 0 0 y
9 0 0 2109 0.0 0 0 y
10 0 0 2110 0.0 0 0 n
11 0 0 2111 0.0 0 0 y
12 0 0 0 2112.0 0 0 y
13 0 0 0 2113.0 0 0 y
14 0 0 0 2114.0 0 0 n
15 0 0 0 0.0 2115 0 n
16 0 0 0 0.0 2116 0 y
17 0 0 0 0.0 0 2117 y
18 0 0 0 0.0 0 2118 n
19 0 0 0 0.0 0 2119 y
20 0 0 0 0.0 2120 0 n
非常感谢你!效果非常好!我刚刚执行了pandas.merge而不是series.map。很好,非常感谢!效果非常好!我刚刚执行了pandas.merge而不是series.map。工作很好。
print (df1.mask(df1.eq(0)).ffill(axis=1))
a b c d e f
0 NaN NaN 2106.0 2106.0 2106.0 2106.0
1 NaN 2103.0 2103.0 2103.0 2103.0 2103.0
2 NaN 2104.0 2104.0 2104.0 2104.0 2104.0
3 NaN 2105.0 2105.0 2105.0 2105.0 2105.0
4 2100.0 2100.0 2100.0 2100.0 2100.0 2100.0
5 2101.0 2101.0 2101.0 2101.0 2101.0 2101.0
6 2102.0 2102.0 2102.0 2102.0 2102.0 2102.0
7 NaN NaN 2107.0 2107.0 2107.0 2107.0
8 NaN NaN 2108.0 2108.0 2108.0 2108.0
9 NaN NaN 2109.0 2109.0 2109.0 2109.0
10 NaN NaN 2110.0 2110.0 2110.0 2110.0
11 NaN NaN 2111.0 2111.0 2111.0 2111.0
12 NaN NaN NaN 2112.0 2112.0 2112.0
13 NaN NaN NaN 2113.0 2113.0 2113.0
14 NaN NaN NaN 2114.0 2114.0 2114.0
15 NaN NaN NaN NaN 2115.0 2115.0
16 NaN NaN NaN NaN 2116.0 2116.0
17 NaN NaN NaN NaN NaN 2117.0
18 NaN NaN NaN NaN NaN 2118.0
19 NaN NaN NaN NaN NaN 2119.0
20 NaN NaN NaN NaN 2120.0 2120.0
print (df1.mask(df1.eq(0)).ffill(axis=1).iloc[:, -1])
0 2106.0
1 2103.0
2 2104.0
3 2105.0
4 2100.0
5 2101.0
6 2102.0
7 2107.0
8 2108.0
9 2109.0
10 2110.0
11 2111.0
12 2112.0
13 2113.0
14 2114.0
15 2115.0
16 2116.0
17 2117.0
18 2118.0
19 2119.0
20 2120.0
Name: f, dtype: float64
df['Pid'] = df.sum(axis=1)
df['Pid'] = df['Pid'].astype(int)
df = pd.merge(df, df2, on='Pid', how='inner')
df.drop('Pid', axis=1, inplace=True)
df
a b c d e f ind
0 0 0 2106 0.0 0 0 n
1 0 2103 0 0.0 0 0 n
2 0 2104 0 0.0 0 0 y
3 0 2105 0 0.0 0 0 n
4 2100 0 0 0.0 0 0 y
5 2101 0 0 0.0 0 0 n
6 2102 0 0 0.0 0 0 y
7 0 0 2107 0.0 0 0 n
8 0 0 2108 0.0 0 0 y
9 0 0 2109 0.0 0 0 y
10 0 0 2110 0.0 0 0 n
11 0 0 2111 0.0 0 0 y
12 0 0 0 2112.0 0 0 y
13 0 0 0 2113.0 0 0 y
14 0 0 0 2114.0 0 0 n
15 0 0 0 0.0 2115 0 n
16 0 0 0 0.0 2116 0 y
17 0 0 0 0.0 0 2117 y
18 0 0 0 0.0 0 2118 n
19 0 0 0 0.0 0 2119 y
20 0 0 0 0.0 2120 0 n