Python 使用windows命令在hadoop上执行MRJob时出错

Python 使用windows命令在hadoop上执行MRJob时出错,python,hadoop,mapreduce,mrjob,Python,Hadoop,Mapreduce,Mrjob,我正在尝试使用windows命令在hadoop集群上执行MRJob。当我写下以下内容时,它正在发挥作用: Python C:\Users\salha\Documents\Thesis\Implementation\Jacobi_2classes.py C:\Users\salha\Documents\Thesis\Implementation\x.txt C:\Users\salha\Documents\Thesis\Implementation\b.txt C:\Users\salha\D

我正在尝试使用windows命令在hadoop集群上执行MRJob。当我写下以下内容时,它正在发挥作用:

Python C:\Users\salha\Documents\Thesis\Implementation\Jacobi_2classes.py 
C:\Users\salha\Documents\Thesis\Implementation\x.txt 
C:\Users\salha\Documents\Thesis\Implementation\b.txt
C:\Users\salha\Documents\Thesis\Implementation\matrix.txt
以下是我编写的命令:

Python C:\Users\salha\Documents\Thesis\Implementation\Jacobi_2classes.py -r hadoop --hadoop-streaming-jar "C:\hadoop-2.9.1\share\hadoop\tools\lib\hadoop-streaming-2.9.1.jar"  C:\Users\salha\Documents\Thesis\Implementation\x.txt 

C:\Users\salha\Documents\Thesis\Implementation\b.txt 

C:\Users\salha\Documents\Thesis\Implementation\matrix.txt
这是我得到的:

C:\Users\salha\Anaconda3\lib\site-packages\numpy\__init__.py:140: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service
  from . import _distributor_init
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in C:\hadoop-2.9.1\bin\bin...
Looking for hadoop binary in $PATH...
Found hadoop binary: C:\hadoop-2.9.1\bin\hadoop.CMD
Using Hadoop version 2.9.1
Creating temp directory C:\Users\salha\AppData\Local\Temp\Jacobi_2classes.salha.20200303.052236.139525
uploading working dir files to hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/wd...
Copying other local files to hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/
Running step 1 of 2...
  WARNING: An illegal reflective access operation has occurred
  WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/hadoop-2.9.1/share/hadoop/common/lib/hadoop-auth-2.9.1.jar) to method sun.security.krb5.Config.getInstance()
  WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
  WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
  WARNING: All illegal access operations will be denied in a future release
  Found 2 unexpected arguments on the command line [hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/wd/mrjob.zip#mrjob.zip, hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/wd/setup-wrapper.sh#setup-wrapper.sh]
  Try -help for more information
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 2 failed: Command '['C:\\hadoop-2.9.1\\bin\\hadoop.CMD', 'jar', 'C:\\hadoop-2.9.1\\share\\hadoop\\tools\\lib\\hadoop-streaming-2.9.1.jar', '-files', 'hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/wd/Jacobi_2classes.py#Jacobi_2classes.py,hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/x.txt', '-input', 'hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/b.txt', '-input', 'hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/files/matrix.txt', '-output', 'hdfs:///user/salha/tmp/mrjob/Jacobi_2classes.salha.20200303.052236.139525/step-output/0000', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 Jacobi_2classes.py --step-num=0 --mapper', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 Jacobi_2classes.py --step-num=0 --reducer']' returned non-zero exit status 1.

为什么不使用pyspark?谢谢你的提问,,,,我正在做两个实验,第一个在MapReduce上解决问题,第二个在Spark上解决问题,然后我将比较它们。现在我正在做第一个实验,就是使用MapReduce解决问题,我必须多次调用作业(例如100本书)。好吧,我认为您的安装配置错误,因为
无法获取历史日志;缺少作业ID
似乎是个问题。我建议使用Ambari重新安装Hadoop环境谢谢您的重播,MRJob是否在Ambari上执行?我使用MRJob在Python上编写了代码来执行multistepMRJob只是一个Python库,可以在任何地方运行。Ambari是Hadoop/Thread/Hive/Spark等的管理/安装UI