Mapreduce 如何将输入路径传递到MRJob.mapper_raw?
我正在使用mrjob包运行一个mapreduce作业,我正在尝试使用MapperRaw,如下所示Mapreduce 如何将输入路径传递到MRJob.mapper_raw?,mapreduce,mrjob,Mapreduce,Mrjob,我正在使用mrjob包运行一个mapreduce作业,我正在尝试使用MapperRaw,如下所示 class MRJOB(MRJob): def mapper_raw(self, input_path, input_uri): import csv print(input_path, input_uri) with open(input_path) as f: reader = csv.reader(
class MRJOB(MRJob):
def mapper_raw(self, input_path, input_uri):
import csv
print(input_path, input_uri)
with open(input_path) as f:
reader = csv.reader(f)
for line in reader:
if line:
yield (0, line)
def steps(self):
return [
MRStep(mapper=self.mapper_raw)
] # , reducer=self.reducer)]
if __name__ == "__main__":
MRJOB().run()
但是,如果以python mrjobfile.py inputfile.csv的形式运行,则会出现以下错误:
TypeError: expected str, bytes or os.PathLike object, not NoneType
如何告诉mrjob将输入视为带有文件名的字符串?它似乎只是传入文件的第一行。您必须在MRStep中声明mapper\u raw
,而不是mapper
。那么,它将是什么样子:
def steps(self):
return [
MRStep(mapper_raw=self.mapper_raw)
] # , reducer=self.reducer)]