Hadoop 使用MapReduce获得指定程度内的朋友

Hadoop 使用MapReduce获得指定程度内的朋友,hadoop,mapreduce,social-networking,graph-theory,hadoop-streaming,Hadoop,Mapreduce,Social Networking,Graph Theory,Hadoop Streaming,您知道如何使用MapReduce范例实现此算法吗 def getFriends(self, degree): friendList = [] self._getFriends(degree, friendList) return friendList def _getFriends(self, degree, friendList): friendList.append(self) if degree: for friend in sel

您知道如何使用MapReduce范例实现此算法吗

def getFriends(self, degree):
    friendList = []
    self._getFriends(degree, friendList)
    return friendList

def _getFriends(self, degree, friendList):
    friendList.append(self)
    if degree:
        for friend in self.friends:
            friend._getFriends(degree-1, friendList)
假设我们有以下双向友谊:

(1,2)、(1,3)、(1,4)、(4,5)、(4,6)、(5,7)、(5,8)

例如,如何获得用户1的1级、2级和3级连接?答案必须是1->2,3,4,5,7,8


谢谢

也许您可以使用支持类似sql查询的配置单元

据我所知,你想收集社交图中某个人第n圈中的所有朋友。大多数图算法都是递归的,递归并不适合用MapReduce方法来解决任务

我可以建议您使用来解决这个问题(实际上它在引擎盖下使用MapReduce)。它主要是异步的,您可以编写描述单个节点行为的作业,如:

1. Send a message from root node to all friends to get their friendlist.
2.1. Each friend sends a message with friendlist to root node.
2.2. Each friend sends a message to all it's sub-friends to get their friendlist.
3.1. Each sub-friend sends a message with friendlist to root node.
3.2. Each sub-friend sends a message to all it's sub-sub-friends to get their friendlist.
...
N. Root node collects all these messages and merges them in a single list.
您还可以使用map reduce作业级联来收集圆,但这不是解决该任务的非常有效的方法:

  • 将root用户朋友导出到文件
    circle-001
  • 使用
    circle-001
    作为作业的输入,该作业将每个用户朋友从
    circle-001
    导出到
    circle-002
  • 执行相同操作,但使用
    circle-002
    作为输入
  • 重复N次

  • 如果有很多用户计算他们的圆,那么第一种方法更合适。第二种方法启动多个MR作业的开销很大,但它简单得多,对于输入量较小的用户来说也可以。

    我在这个领域是新手,但这里是我的想法

    您可以按照下面的伪代码使用传统的BFS算法

    在每次迭代中,您都会启动一个Hadoop作业,发现当前工作集中尚未访问的所有子节点

    BFS (list curNodes, list visited, int depth){
        if (depth <= 0){
            return visited;
        }
    
        //run Hadoop job on the current working set curNodes restricted by visited
    
        //the job will populate some result list with the list of child nodes of the current working set
    
        //then,
    
        visited.addAll(result);
        curNodes.empty();
        curNodes.addAll(result);
    
        BFS(curNodes, visited, depth-1);
    }
    
    BFS(列表节点、访问列表、整数深度){
    
    如果(深度)感谢您的回复。我只需要在MapReduce上执行:-(感谢您的回复。我只需要在MapReduce上执行:-(你能更详细地解释我如何级联map reduce作业来收集圆圈吗?更新了原始答案来描述第二种方法。请注意,Giraph实际上是一个隐藏的MapReduce作业。它只是一个抽象层,用于处理Hadoop上的大型图。非常感谢你的回答。请查看我新更新的初始问题让我更清楚地知道我需要什么才能取得很多成就!这很有帮助!
      public static class VertexMapper extends
          Mapper<Object, Text, IntWritable, IntWritable> {
    
        private static Set<IntWritable> curVertex = null;
        private static IntWritable curLevel = null;
        private static Set<IntWritable> visited = null;
    
        private IntWritable key = new IntWritable();
        private IntWritable value = new IntWritable();
    
        public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
    
          StringTokenizer itr = new StringTokenizer(value.toString(), ",");
          if (itr.countTokens() == 2) {
            String keyStr = itr.nextToken();
            String valueStr = itr.nextToken();
            try {
              this.key.set(Integer.parseInt(keyStr));
              this.value.set(Integer.parseInt(valueStr));
    
              if (VertexMapper.curVertex.contains(this.key)
                  && !VertexMapper.visited.contains(this.value)
                  && !key.equals(value)) {
                context.write(VertexMapper.curLevel, this.value);
              }
            } catch (NumberFormatException e) {
              System.err.println("Found key,value <" + keyStr + "," + valueStr
                  + "> which cannot be parsed as int");
            }
          } else {
            System.err.println("Found malformed line: " + value.toString());
          }
        }
      }
    
      public static class UniqueReducer extends
          Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
    
        private static Set<IntWritable> result = new HashSet<IntWritable>();
    
        public void reduce(IntWritable key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {
    
          for (IntWritable val : values) {
            UniqueReducer.result.add(new IntWritable(val.get()));
          }
          // context.write(key, key);
        }
      }
    
    UniqueReducer.result.clear();
    VertexMapper.curLevel = new IntWritable(1);
    VertexMapper.curVertex = new HashSet<IntWritable>(1);
    VertexMapper.curVertex.add(new IntWritable(1));
    VertexMapper.visited = new HashSet<IntWritable>(1);
    VertexMapper.visited.add(new IntWritable(1));
    
    Configuration conf = getConf();
    Job job = new Job(conf, "BFS");
    job.setJarByClass(BFSExample.class);
    job.setMapperClass(VertexMapper.class);
    job.setCombinerClass(UniqueReducer.class);
    job.setReducerClass(UniqueReducer.class);
    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    job.setOutputFormatClass(NullOutputFormat.class);
    boolean result = job.waitForCompletion(true);
    
    BFSExample bfs = new BFSExample();
    ToolRunner.run(new Configuration(), bfs, args);