Python 使用Pandas将名称替换为单独文件中的索引_Python_Indexing_Pandas

Python 使用Pandas将名称替换为单独文件中的索引

python indexing pandas

Python 使用Pandas将名称替换为单独文件中的索引,python,indexing,pandas,Python,Indexing,Pandas,我有一个节点和一个边缘列表，如下所示： Id Label Type 1 fie gnome 2 fou giant 3 fim gnome 4 fee dwarf Source target Weight fie fou 2 fie fim 2 fou fee 2 fee fim 3 如何用节点文件中的索引替换源文件和目标文件中的名称最终输出应为： Source target Weig

我有一个节点和一个边缘列表，如下所示：

Id   Label   Type
1   fie      gnome
2   fou      giant
3   fim      gnome
4   fee      dwarf

Source   target  Weight
fie   fou   2
fie   fim   2
fou   fee   2
fee   fim   3

如何用节点文件中的索引替换源文件和目标文件中的名称

最终输出应为：

Source target   Weight
1      2        2
1      3        2
2      4        2
4      3        3

我可能会从

节点.Label

和

节点.Id

构建一个

dict

，然后将其传递到

替换（）

或

applymap

。例如：

>>> weight.stack().replace(dict(zip(nodes.Label, nodes.Id))).unstack()
  Source target Weight
0      1      2      2
1      1      3      2
2      2      4      2
3      4      3      3
>>> d = dict(zip(nodes.Label, nodes.Id))
>>> weight.applymap(lambda x: d.get(x,x))
   Source  target  Weight
0       1       2       2
1       1       3       2
2       2       4       2
3       4       3       3

一些解释。首先，我们从数据帧开始：

>>> nodes
   Id Label   Type
0   1   fie  gnome
1   2   fou  giant
2   3   fim  gnome
3   4   fee  dwarf
>>> weight
  Source target  Weight
0    fie    fou       2
1    fie    fim       2
2    fou    fee       2
3    fee    fim       3

然后我们制作

dict

我们想用以下内容替换：

>>> d = dict(zip(nodes.Label, nodes.Id))
>>> d
{'fou': 2, 'fim': 3, 'fee': 4, 'fie': 1}

不幸的是，

.replace（）

不能像您认为的那样在数据帧上工作，因为它适用于行和列，而不是元素。但是我们可以

stack

和

unstack

来解决这个问题：

>>> weight.stack()
0  Source    fie
   target    fou
   Weight      2
1  Source    fie
   target    fim
   Weight      2
2  Source    fou
   target    fee
   Weight      2
3  Source    fee
   target    fim
   Weight      3
dtype: object
>>> weight.stack().replace(d)
0  Source    1
   target    2
   Weight    2
1  Source    1
   target    3
   Weight    2
2  Source    2
   target    4
   Weight    2
3  Source    4
   target    3
   Weight    3
dtype: object
>>> weight.stack().replace(d).unstack()
  Source target Weight
0      1      2      2
1      1      3      2
2      2      4      2
3      4      3      3

或者，我们也可以使用

lambda

和

applymap

。字典有一个接受默认参数的

get

方法，因此

somedict.get（k，'default value go here'）

将查找键

，如果找到键，则返回相应的值，否则返回第二个参数。因此

d.get（x，x）

要么将

更改为字典中的相应值，要么返回

而不使用它。因此：

>>> weight.applymap(lambda x: d.get(x,x))
   Source  target  Weight
0       1       2       2
1       1       3       2
2       2       4       2
3       4       3       3

PS：如果您只想对某些列应用replace，那么同样的基于dict的方法也可以，但是您必须限制应用程序。例如，如果您想换一种方式，您可能不希望权重列中的

变为

fou

我不确定您想要的输出是什么样的。是否希望第二个文件的第一行变成

gnome giant 2

？谢谢--编辑后添加所需的输出。很有趣！叠放/拆垛似乎有点贵。第二种方法更快吗？堆叠和卸垛并不像你想象的那么慢，但至少在这种情况下，它更慢。在我的测试中，

applymap

方法似乎总是快2倍左右，但是YMMV，如果这是一个瓶颈，我会非常惊讶。奇怪的是，两者都不起作用。replace方法只替换了第一列。我发现这非常令人惊讶。你能找到一个它不起作用的最简单的例子并将

.to_dict（）

？一种可能是一列中有空白，而另一列中没有空白，因此替换无效，因为实际上没有匹配项。嗯。。。空白是可能的。。让我试试。