Hadoop Map-reduce中的SQL建模_Hadoop_Mapreduce_Hdfs_Bigdata

Hadoop Map-reduce中的SQL建模

hadoop mapreduce

Hadoop Map-reduce中的SQL建模,hadoop,mapreduce,hdfs,bigdata,Hadoop,Mapreduce,Hdfs,Bigdata,我正在尝试对SQL查询建模，比如在MapReduce中从col2=value2的表中选择distinct（col1）。我使用的逻辑是，每个映射器将检查where子句，如果找到匹配项，它将发出where子句值作为键，col1作为值。根据默认的哈希函数，所有输出都将和where子句中的键使用值一起放入同一个减缩器。在reducer中，我可以排除重复并发出不同的值。这是正确的方法吗这是实现这一目标的正确方法吗注意：此查询的数据位于CSV文件中。//映射器伪代码 //MAPPER pseudo co

我正在尝试对SQL查询建模，比如在MapReduce中从col2=value2的表中选择distinct（col1）。我使用的逻辑是，每个映射器将检查where子句，如果找到匹配项，它将发出where子句值作为键，col1作为值。根据默认的哈希函数，所有输出都将和where子句中的键使用值一起放入同一个减缩器。在reducer中，我可以排除重复并发出不同的值。这是正确的方法吗

这是实现这一目标的正确方法吗

注意：此查询的数据位于CSV文件中。

//映射器伪代码
//MAPPER pseudo code
public static class DistinctMapper extends  Mapper<Object, Text, Text, NullWritable> {
        private Text col1 = new Text();
        private Text col2 = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            // Logic to extract columns
            String C1  = extractColumn(value);
            String C2  = extractColumn(value);


            if (C2 != 'WhereCluaseValue') {  // filter value
                return;
            }
            // Mapper output key to the distinct column value
            col1.set(C1);
            // Mapper value as NULL
            context.write(col1, NullWritable.get());
        }
    }

//REDUCER pseudo code
public static class DistinctReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
        public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
            // distinct column with a null value
            //Here we are not concerned about the list of values
            context.write(key, NullWritable.get());
        }
}

公共静态类DistinctMapper扩展映射器{
私有文本col1=新文本（）；
私有文本col2=新文本（）；
公共void映射（对象键、文本值、上下文上下文）引发IOException、InterruptedException{
//提取列的逻辑
字符串C1=提取列（值）；
字符串C2=提取列（值）；
如果（C2！=“WhereCluaseValue”）{//filter value
返回；
}
//将输出键映射到不同的列值
col1.set（C1）；
//映射器值为空
write（col1，nullwriteable.get（））；
}
}
//减速机伪码
公共静态类DistinctReducer扩展了Reducer{
公共void reduce（文本键、Iterable值、上下文上下文）引发IOException、InterruptedException{
//具有空值的不同列
//这里我们不关心值列表
write（key，nullwriteable.get（））；
}
}

您试过了吗？我需要使用map reduce框架来完成。我使用的逻辑是，每个映射器将检查where子句，如果匹配，则将where子句作为键，col1作为值。根据默认的哈希函数，所有输出都将转到同一个减速机。在reducer中，我可以排除重复并发出不同的值。这是正确的方法吗？