Python 如果给定的两列相同,请在行中查找相似的元素以创建新列
我的数据框看起来Python 如果给定的两列相同,请在行中查找相似的元素以创建新列,python,pandas,dataframe,Python,Pandas,Dataframe,我的数据框看起来 df = query subject HPSame 0 cat dog HPS_1 1 cat horse HPS_2 2 king queen HPS_3 3 queen people HPS_4 4 CAR VAN HPS_5 5 dog tiger HPS_6 6 CAR
df =
query subject HPSame
0 cat dog HPS_1
1 cat horse HPS_2
2 king queen HPS_3
3 queen people HPS_4
4 CAR VAN HPS_5
5 dog tiger HPS_6
6 CAR TRUCK HPS_7
7 horse deer HPS_8
8 CAR JEEP HPS_9
9 TRUCK LORRY HPS_10
10 VAN TRAIN HPS_11
11 people children HPS_12
在df中,查询类似于主题,即cat类似于dog,因此标记为HPS_1。此外,猫和马相似,狗和老虎相似,因此,应该有相同的匹配标签HPS_1。我希望找到类似的元素,比如if a=b=c=d,并在新列中给它们相同的标签。我试图简化我的问题。主题和查询基本上由字母数字元素组成,WP_020314852.1=WP_004217899.1=WP_150395973.1表示相同类型。预期结果如下
df =
query subject HPSame match
0 cat dog HPS_1 HPS_1
1 cat horse HPS_2 HPS_1
2 king queen HPS_3 HPS_3
3 queen people HPS_4 HPS_3
4 CAR VAN HPS_5 HPS_5
5 dog tiger HPS_6 HPS_1
6 CAR TRUCK HPS_7 HPS_5
7 horse deer HPS_8 HPS_1
8 CAR JEEP HPS_9 HPS_5
9 TRUCK LORRY HPS_10 HPS_5
10 VAN TRAIN HPS_11 HPS_5
11 people children HPS_12 HPS_3
我试过了
df['query_s'] = df['query'].shift(-1)
df['HPSame_s'] = df['HPSame'].shift(-1)
condition = [(df['query'] == df['query_s'])]
ifTrue = df['HPSame']
ifFalse = df['HPSame_s']
df['match'] = np.where(condition, ifTrue, ifFalse)
这会抛出ValueError:值的长度与索引的长度不匹配我们可以使用以下方法执行此操作:
输出:
query subject HPSame match
0 cat dog HPS_1 HPS_1
1 cat horse HPS_2 HPS_1
2 king queen HPS_3 HPS_3
3 queen people HPS_4 HPS_3
4 CAR VAN HPS_5 HPS_5
5 dog tiger HPS_6 HPS_1
6 CAR TRUCK HPS_7 HPS_5
7 horse deer HPS_8 HPS_1
8 CAR JEEP HPS_9 HPS_5
9 TRUCK LORRY HPS_10 HPS_5
10 VAN TRAIN HPS_11 HPS_5
11 people children HPS_12 HPS_3
数据帧中图形网络的图像:
fig, ax = plt.subplots(figsize=(10,8))
nx.draw_networkx(G, node_color='y')
我可能反应迟钝,但我不明白“猫和狗很相似,所以给它贴上了HPS_1”的标签。另外,猫和马相似,狗和老虎相似,所以,应该有相同的匹配标签,HPS_1。“你如何定义相似性?”?“匹配”列是如何计算的?比方说,cat是一种ID为WP_120314582.1的蛋白质,而dog是另一种ID为WP_13242761.5的蛋白质。这两种蛋白质都是100%相似的,所以它们应该有相同的名字,尽管它们有不同的ID。非常感谢!
fig, ax = plt.subplots(figsize=(10,8))
nx.draw_networkx(G, node_color='y')