Gremlin 海王星-如何用比例权重计算到所有节点的距离_Gremlin_Tinkerpop3_Amazon Neptune_Gremlinpython

Gremlin 海王星-如何用比例权重计算到所有节点的距离

gremlin

Gremlin 海王星-如何用比例权重计算到所有节点的距离,gremlin,tinkerpop3,amazon-neptune,gremlinpython,Gremlin,Tinkerpop3,Amazon Neptune,Gremlinpython,对于下面的场景，我很难在gremlin中找到查询。这是有向图（可能是循环的）的结果我想得到前N个有利节点，从节点“Jane”开始，其中有利的定义为： favor(Jane->Lisa) = edge(Jane,Lisa) / total weight from outwards edges of Lisa favor(Jane->Thomas) = favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Tho

对于下面的场景，我很难在gremlin中找到查询。这是有向图（可能是循环的）的结果

我想得到前N个有利节点，从节点“Jane”开始，其中有利的定义为：

favor(Jane->Lisa) = edge(Jane,Lisa) / total weight from outwards edges of Lisa
favor(Jane->Thomas) = favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)

favor(Jane->Jerryd) = favor(Jane->Thomas) * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)

favor(Jane->Jerryd) = [favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)] * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)


and so .. on

这是同一张图表，我的意思是手工计算

这对于编程来说是相当简单的，但我不确定如何用gremlin甚至sparql查询它

以下是创建此示例图的查询：

g
.addV('person').as('1').property(single, 'name', 'jane')
.addV('person').as('2').property(single, 'name', 'thomas')
.addV('person').as('3').property(single, 'name', 'lisa')
.addV('person').as('4').property(single, 'name', 'wyd')
.addV('person').as('5').property(single, 'name', 'jerryd')
.addE('favor').from('1').to('2').property('weight', 10)
.addE('favor').from('1').to('3').property('weight', 20)
.addE('favor').from('3').to('2').property('weight', 90)
.addE('favor').from('2').to('4').property('weight', 50)
.addE('favor').from('2').to('5').property('weight', 90)
.addE('favor').from('3').to('5').property('weight', 100)

我想要的是：

[Lisa, computedFavor]
[Thomas, computedFavor]
[Jerryd, computedFavor]
[Wyd, computedFavor]

我很难用循环图来调整重量。到目前为止，我可以在这里查询：

回应斯蒂芬·马利特的评论：

favor(Jane->Jerryd) = 
    favor(Jane->Thomas) * favor(Thomas->Jerryd) 
  + favor(Jane->Lisa) * favor(Lisa->Jerryd)

// note we can expand on favor(Jane->Thomas) in above expression
// 
// favor(Jane->Thomas) is favor(Jane->Thomas)@directEdge +
//                        favor(Jane->Lisa) * favor(Lisa->Thomas)
//

算例

Jane to Lisa                   => 20/(10+20)         => 2/3
Lisa to Jerryd                 => 100/(100+90)       => 10/19
Jane to Lisa to Jerryd         => 2/3*(10/19)

Jane to Thomas (directly)      => 10/(10+20)         => 1/3
Jane to Lisa to Thomas         => 2/3 * 90/(100+90)  => 2/3 * 9/19
Jane to Thomas                 => 1/3 + (2/3 * 9/19)

Thomas to Jerryd               => 90/(90+50)         => 9/14
Jane to Thomas to Jerryd       => [1/3 + (2/3 * 9/19)] * (9/14)

Jane to Jerryd:
= Jane to Lisa to Jerryd + Jane to Thomas to Jerryd
= 2/3 * (10/19) + [1/3 + (2/3 * 9/19)] * (9/14)

下面是一些psedocode：

def get_-favors（图形，label=“jane”，起始_-favors=1）：
start=graph.findNode（标签）
队列=[（开始，开始）]
偏爱={}
seen=set（）
排队时：
节点，curr\u favor=queue.popleft（）
#从该节点获取总重量（外边缘）
总数=0
对于node.out_边中的（edgeW，outNode）：
总偏好=总偏好+边缘
对于node.out_边中的（edgeW，outNode）：
#如果没有此节点的偏好
#接受当前的帮助并提供相应的帮助
如果outNode不支持：
偏好[outNode]=当前偏好*（边缘/总体偏好）
#它已经有了一些优势，所以我们增加了它
#我们加上比例优惠
其他：
偏好[outNode]+=当前偏好*（边缘/总体偏好）
#如果我们看到这条边，节点忽略
#否则，横向
如果（edgeW、outNode）不在可见范围内：
seen.add（（edgeW，outNode））
append（（outNode，favors[outNode]））
#按值排序并返回前X
回馈

这里有一个小精灵的查询，我相信它正确地应用了您的公式。我将首先粘贴完整的最终查询，然后对涉及的步骤说几句话

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    sack().
.....12>    sum()     

==>0.768170426065163

查询以Jane开始，并一直遍历，直到检查到Jerry D的所有路径为止。对于每个遍历器，将保留一个

sack

，其中包含每个关系的计算权重值乘以在一起。第6行的计算会找到所有可能来自前一个顶点的边权重值，第7行的

math

步骤用于将当前边上的权重除以该和。最后，将每个计算结果相加到第12行。如果删除最后的

sum

步骤，则可以看到中间结果

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    sack()

==>0.2142857142857143
==>0.3508771929824561
==>0.2030075187969925

要查看所采取的路线，可以在查询中添加

路径步骤
gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight'),
.....16>        sack()).fold()) 

==>[[jane,10,thomas,90,jerryd],0.2142857142857143]
==>[[jane,20,lisa,100,jerryd],0.3508771929824561]
==>[[jane,20,lisa,90,thomas,90,jerryd],0.2030075187969925]   

这种方法还考虑了根据您的公式添加任何直接连接，因为我们可以看到，如果我们使用Thomas作为目标
gremlin>  g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','thomas')).
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight'),
.....16>        sack()).fold())    

==>[[jane,10,thomas],0.3333333333333333]
==>[[jane,20,lisa,90,thomas],0.3157894736842105]  

这些额外的步骤是不需要的，但是当调试这样的查询时，包含路径
非常有用。此外，这不是必需的，但可能只是出于一般利益，我会补充说，你也可以从这里得到最终答案，但我包含的第一个问题是你真正需要的
g.withSack(1).V().
   has('name','jane').
   repeat(outE().
          sack(mult).
            by(project('w','f').
              by('weight').
              by(outV().outE().values('weight').sum()).
              math('w / f')).
          inV().
          simplePath()).
   until(has('name','thomas')).
   local(
     union(
       path().
         by('name').
         by('weight'),
       sack()).fold().tail(local)).  
    sum() 
  
==>0.6491228070175439  

如果其中任何一项不清楚或我错误理解了公式，请让我知道
编辑以添加
为了找到所有可以从Jane那里找到的人的结果，我不得不稍微修改一下查询。最后的展开
只是为了让结果更容易阅读
gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    emit().
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight').unfold(),
.....16>        sack()).fold()).
.....17>        group().
.....18>          by(tail(local,2).limit(local,1)).
.....19>          by(tail(local).sum()).
.....20>        unfold()

==>jerryd=0.768170426065163
==>wyd=0.23182957393483708
==>lisa=0.6666666666666666
==>thomas=0.6491228070175439    

第17行的最后一个组
步骤使用路径
结果来计算找到的每个唯一姓名的总偏好。要查看路径，可以在删除组
步骤的情况下运行查询
gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    emit().
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight').unfold(),
.....16>        sack()).fold())

==>[jane,10,thomas,0.3333333333333333]
==>[jane,20,lisa,0.6666666666666666]
==>[jane,10,thomas,50,wyd,0.11904761904761904]
==>[jane,10,thomas,90,jerryd,0.2142857142857143]
==>[jane,20,lisa,90,thomas,0.3157894736842105]
==>[jane,20,lisa,100,jerryd,0.3508771929824561]
==>[jane,20,lisa,90,thomas,50,wyd,0.11278195488721804]
==>[jane,20,lisa,90,thomas,90,jerryd,0.2030075187969925]    

这非常优雅，最适合与Neptune和Python相关的环境。我提供第二个参考，以防其他人遇到这个问题。从我看到这个问题的那一刻起，我只能想象它是用一台图形计算机
以OLAP方式解决的。因此，我很难用其他方式来思考它。当然，使用VertexProgram
需要像Java这样的JVM语言，不能直接与Neptune一起工作。我想我最近的解决方法应该是使用Java，从Neptune抓取一个子图（）
，然后在TinkerGraph本地运行自定义的VertexProgram
，这将非常快速
更一般地说，如果没有Python/Neptune要求，根据图形的性质和需要遍历的数据量，将算法转换为VertexProgram
不是一种坏方法。由于没有太多关于这个主题的内容，我想在这里提供它的核心代码。这就是它的精髓：
        @Override
        public void execute(final Vertex vertex, final Messenger<Double> messenger, final Memory memory) {
            // on the first pass calculate the "total favor" for all vertices
            // and pass the calculated current favor forward along incident edges
            // only for the "start vertex" 
            if (memory.isInitialIteration()) {
                copyHaltedTraversersFromMemory(vertex);

                final boolean startVertex = vertex.value("name").equals(nameOfStartVertrex);
                final double initialFavor = startVertex ? 1d : 0d;
                vertex.property(VertexProperty.Cardinality.single, FAVOR, initialFavor);
                vertex.property(VertexProperty.Cardinality.single, TOTAL_FAVOR,
                        IteratorUtils.stream(vertex.edges(Direction.OUT)).mapToDouble(e -> e.value("weight")).sum());

                if (startVertex) {
                    final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
                    memory.add(VOTE_TO_HALT, !incidents.hasNext());
                    while (incidents.hasNext()) {
                        final Edge incident = incidents.next();
                        messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
                                (double) incident.value("weight") /  (double) vertex.value(TOTAL_FAVOR));
                    }
                }
            } else {
                // on future passes, sum all the incoming "favor" and add it to
                // the "favor" property of each vertex. then once again pass the
                // current favor to incident edges. this will keep happening 
                // until the message passing stops.
                final Iterator<Double> messages = messenger.receiveMessages();
                final boolean hasMessages = messages.hasNext();
                if (hasMessages) {
                    double adjacentFavor = IteratorUtils.reduce(messages, 0.0d, Double::sum);
                    vertex.property(VertexProperty.Cardinality.single, FAVOR, (double) vertex.value(FAVOR) + adjacentFavor);

                    final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
                    memory.add(VOTE_TO_HALT, !incidents.hasNext());
                    while (incidents.hasNext()) {
                        final Edge incident = incidents.next();
                        messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
                                adjacentFavor * ((double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR)));
                    }
                }
            }
        }

“元素”遍历产生：
{id=0, label=person, ^favor=1.0, name=jane, ^totalFavor=30.0}
{id=2, label=person, ^favor=0.6491228070175439, name=thomas, ^totalFavor=140.0}
{id=4, label=person, ^favor=0.6666666666666666, name=lisa, ^totalFavor=190.0}
{id=6, label=person, ^favor=0.23182957393483708, name=wyd, ^totalFavor=0.0}
{id=8, label=person, ^favor=0.768170426065163, name=jerryd, ^totalFavor=0.0}

为了清楚起见，请您更新问题以指定如何计算人情（Jane->JerryD）
？我只是想确保我完全理解这个计算。如果您更新了您的小精灵样本数据，使其与您的图片完全匹配，这可能会有所帮助。@stephenmallette我已经添加了Jane->Jerryd
calculation。还添加了python中的psedoocode。小精灵数据现在和图片匹配。在gremlin数据中（顶点是有名字标签的人，边是有权重的人）。让我知道是否还有更多我可以澄清的。嗨@kelvin lawrence，谢谢你的回答。计算看起来是正确的！但是，有没有办法计算Jane对每个唯一节点的偏好？在你的回答中，现在-我们正在穿越，直到Jerryd。我该如何着手，获取按偏好排序的唯一节点列表，这些节点可以从Jane处访问？e、 g.=>jerryd（favorScore）、lisa（favorScore）、thomas（favorScore）、Wyd（favorScore）。在这种情况下，将横截深度限制为X是可以的，直到无限横截为止。此外，我想通过我的感谢，让这本书公开发行，这对我帮助很大：）我更新了答案
ComputerResult result = graph.compute().program(FavorVertexProgram.build().name("jane").create()).submit().get();
GraphTraversalSource rg = result.graph().traversal();
Traversal elements = rg.V().elementMap();

{id=0, label=person, ^favor=1.0, name=jane, ^totalFavor=30.0}
{id=2, label=person, ^favor=0.6491228070175439, name=thomas, ^totalFavor=140.0}
{id=4, label=person, ^favor=0.6666666666666666, name=lisa, ^totalFavor=190.0}
{id=6, label=person, ^favor=0.23182957393483708, name=wyd, ^totalFavor=0.0}
{id=8, label=person, ^favor=0.768170426065163, name=jerryd, ^totalFavor=0.0}