Python 如何按pyspark运行个性化页面排名?
我想将节点1设置为开始节点,以实现个性化页面排名。如何添加开始节点Python 如何按pyspark运行个性化页面排名?,python,pyspark,pagerank,Python,Pyspark,Pagerank,我想将节点1设置为开始节点,以实现个性化页面排名。如何添加开始节点 import networkx as nx from operator import add link_data = { 0: [1, 2], 1: [2, 6], 2: [1, 0], 3: [1, 0], 4: [1], 5: [0, 1], 6: [0, 7], 7: [0, 1, 2, 3, 9], 8: [5, 9], 9: [7] }
import networkx as nx
from operator import add
link_data = {
0: [1, 2],
1: [2, 6],
2: [1, 0],
3: [1, 0],
4: [1],
5: [0, 1],
6: [0, 7],
7: [0, 1, 2, 3, 9],
8: [5, 9],
9: [7]
}
link_graph = nx.DiGraph(link_data)
ranks = sc.range(len(link_data)).map(lambda x : (x, 1.))
links = sc.parallelize(link_data.items()).cache()
links.join(ranks).collect()
def computeContribs(node_urls_rank):
_, (urls, rank) = node_urls_rank
nb_urls = len(urls)
for url in urls:
yield url, rank / nb_urls
for iteration in range(10):
contribs = links.join(ranks).flatMap(computeContribs)
contribs = links.fullOuterJoin(contribs).mapValues(lambda x : x[1] or 0.0)
ranks = contribs.reduceByKey(add)
ranks = ranks.mapValues(lambda rank: rank * 0.85 + 0.15)
for (link, rank) in sorted(ranks.collect()):
print("%s has rank: %s." % (link, rank / len(link_data)))
有人能帮我吗?我应该在哪里设置sourceId来个性化该顶点的结果。谢谢