在hadoop中一行减少两次_Hadoop_Mapreduce

在hadoop中一行减少两次

hadoop mapreduce

在hadoop中一行减少两次,hadoop,mapreduce,Hadoop,Mapreduce,很抱歉标题混淆，很难定义我想做的是将单词序列作为hadoop作业的输入，并按如下方式输出行：小写序列的频率小写序列的频率我认为最好用一个例子来解释：假设我的输入数据是： the sun the sun the sun The sun The sun The Sun 我想和你在一起 the sun 6 the sun 3 the sun 6 The sun 2 the sun 6 The Sun 1 如何减少小写序列频率和原始序列频率？在地图功能中：输出键： sequence.toL

很抱歉标题混淆，很难定义

我想做的是将单词序列作为hadoop作业的输入，并按如下方式输出行：

小写序列的频率小写序列的频率

我认为最好用一个例子来解释：

假设我的输入数据是：

the sun
the sun
the sun
The sun
The sun
The Sun

我想和你在一起

the sun 6 the sun 3
the sun 6 The sun 2
the sun 6 The Sun 1

如何减少小写序列频率和原始序列频率？

在地图功能中：输出键： sequence.toLowerCase（）产值：顺序（按原样）

在每个值的reduce函数中：

Map<String, Integer> occurrences = new HashMap<String, Integer>();
occurrences.put(key, occurrences.get(key) + 1);
if(!key.equals(value)){
occurrences.put(value, occurrences.get(key) + 1);
}

Map引用=新HashMap（）；
事件数.put（键，事件数.get（键）+1）；
如果（！key.equals（value））{
引用.put（值，引用.get（键）+1）；
}

这只是伪代码。您将收到NPE，因为executions.get（key/value）将首先返回空值。只需为这个添加检查。

因此，您将拥有相同序列的不同大写/小写字母的出现和计数的地图。

谢谢@Andrew。我做了类似的事情，结果很好。