Java 在MapReduce中操作用户输入字符串_Java_String_Hadoop_Input_Mapreduce

Java 在MapReduce中操作用户输入字符串

java string hadoop input mapreduce

Java 在MapReduce中操作用户输入字符串,java,string,hadoop,input,mapreduce,Java,String,Hadoop,Input,Mapreduce,我开始使用MapReduce的Hadoop变体，因此对细节没有任何线索。我理解它在概念上是如何运作的我的问题是在提供的一堆文件中查找特定的搜索字符串。我对这些文件不感兴趣——已经分类了。但是，你会如何去征求意见呢？你会在课程的JobConf部分提问吗？如果是这样，我将如何将字符串传递到作业中如果它在map（）函数中，您将如何实现它？它不是每次调用map（）函数时都会请求一个搜索字符串吗以下是主要方法和JobConf（）部分，可以让您了解： public static void main(S

我开始使用MapReduce的Hadoop变体，因此对细节没有任何线索。我理解它在概念上是如何运作的

我的问题是在提供的一堆文件中查找特定的搜索字符串。我对这些文件不感兴趣——已经分类了。但是，你会如何去征求意见呢？你会在课程的JobConf部分提问吗？如果是这样，我将如何将字符串传递到作业中

如果它在

map（）

函数中，您将如何实现它？它不是每次调用

map（）

函数时都会请求一个搜索字符串吗

以下是主要方法和

JobConf（）

部分，可以让您了解：

public static void main(String[] args) throws IOException {

    // This produces an output file in which each line contains a separate word followed by
    // the total number of occurrences of that word in all the input files.

    JobConf job = new JobConf();

    FileInputFormat.setInputPaths(job, new Path("input"));
    FileOutputFormat.setOutputPath(job, new Path("output"));

    // Output from reducer maps words to counts.
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);

    // The output of the mapper is a map from words (including duplicates) to the value 1.
    job.setMapperClass(InputMapper.class);

    // The output of the reducer is a map from unique words to their total counts.
    job.setReducerClass(CountWordsReducer.class);

    JobClient.runJob(job);
}

以及

map（）

函数：

public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {

    // The key is the character offset within the file of the start of the line, ignored.
    // The value is a line from the file.

    //This is me trying to hard-code it. I would prefer an explanation on how to get interactive input!
    String inputString = "data"; 
    String line = value.toString();
    Scanner scanner = new Scanner(line);

    while (scanner.hasNext()) {
        if (line.contains(inputString)) {
            String line1 = scanner.next();
            output.collect(new Text(line1), new LongWritable(1));
        }
    }
    scanner.close();
}

public void映射（LongWritable键、文本值、OutputCollector输出、Reporter报告器）引发IOException{
//关键点是行开始的文件中的字符偏移量，忽略。
//该值是文件中的一行。
//这是我试图硬编码它。我更喜欢一个关于如何获得交互式输入的解释！
String inputString=“数据”；
字符串行=value.toString（）；
扫描仪=新扫描仪（行）；
while（scanner.hasNext（））{
if（line.contains（inputString））{
字符串line1=scanner.next（）；
collect（新文本（第1行）、新长可写（第1行））；
}
}
scanner.close（）；
}

我被引导去相信我不需要一个减速器来解决这个问题。非常感谢任何建议/解释

类是类的扩展，因此，您可以设置自定义属性：

JobConf job = new JobConf();
job.set("inputString", "data");
...

然后，如：Mapper实现文档中所述，可以通过JobConfigurable.configure（JobConf）访问作业的JobConf并初始化它们自己。这意味着您必须在映射器中重新实现此方法，以获得所需的参数：

private static String inputString;

public void configure(JobConf job)
    inputString = job.get("inputString");
}

无论如何，这是使用旧的API。使用新的方法更容易访问配置，因为上下文（以及配置）作为参数传递给

map

方法。

很好的解释-只有两件事。在“job.set（）”方法中，将“data”作为参数传递。我的数据都在已经在输入路径中的文件中，那么我应该在其中放什么呢？其次，在configure方法中编写“test”。这只是一个占位符字符串还是“test”有什么特殊意义？关于第二个问题，是我犯的错误吗，已修复：）关于第一个问题，该示例演示如何向映射器传递一个名为“inputString”且其值为“data”的参数，因为这是您在代码中硬编码的搜索字符串；但将其替换为用户可以作为main方法的参数传递的任何其他字符串。