python通过mrjob查找最大值
我想找到mrjob列表中的最大值。 当我运行此命令时,它总是显示错误: 没有找到配置;退回自动配置; 没有为内联运行程序指定配置 我想知道这是什么意思python通过mrjob查找最大值,python,mapreduce,mrjob,Python,Mapreduce,Mrjob,我想找到mrjob列表中的最大值。 当我运行此命令时,它总是显示错误: 没有找到配置;退回自动配置; 没有为内联运行程序指定配置 我想知道这是什么意思 class MRWordCounter(MRJob): def mapper(self, key, line): num = csv_readline(line) yield num, 1 def reducer(self, word, compare): num_list
class MRWordCounter(MRJob):
def mapper(self, key, line):
num = csv_readline(line)
yield num, 1
def reducer(self, word, compare):
num_list = []
for value in compare:
if value == max(compare):
value=num_list
yield word, num_list
您可以改为使用此方法:-
它所做的只是:-
- 绘制单词图
- 合并每个单词的计数
- 翻转键、值对
- 减少以查找出现的最大单词。
要运行代码,
将文本文件和python脚本保存在同一文件夹中,然后: python3 xyz.py xyz.txt
它所做的只是:-
- 绘制单词图
- 合并每个单词的计数
- 翻转键、值对
- 减少以查找出现的最大单词。
要运行代码,
将文本文件和python脚本保存在同一文件夹中,然后: python3 xyz.py xyz.txt
#The most occurred word
#Import Dependencies
from mrjob.job import MRJob
from mrjob.step import MRStep
import re
WORD_RE = re.compile(r"[\w']+")
class MRMostUsedWord(MRJob):
def mapper_get_words(self, _, line):
# yield each word in the line
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner_count_words(self, word, counts):
# sum the words we've seen so far
yield (word, sum(counts))
def reducer_count_words(self, word, counts):
# send all (num_occurrences, word) pairs to the same reducer.
# num_occurrences is so we can easily use Python's max() function.
yield None, (sum(counts), word)
# discard the key; it is just None
def reducer_find_max_word(self, _, word_count_pairs):
# each item of word_count_pairs is (count, word),
# so yielding one results in key=counts, value=word
yield max(word_count_pairs)
def steps(self):
return [
MRStep(mapper=self.mapper_get_words,
combiner=self.combiner_count_words,
reducer=self.reducer_count_words),
MRStep(reducer=self.reducer_find_max_word)
]
if __name__ == '__main__':
MRMostUsedWord.run()