在python中调用java时，输出在stderr中_Python_Hadoop

在python中调用java时，输出在stderr中

python hadoop

在python中调用java时，输出在stderr中,python,hadoop,Python,Hadoop,我正在用python启动一个MapReduce作业，使用代码[1]。问题是我在stderr字段[3]中获得了正确的输出数据，而不是在stdout字段[2]中。为什么我在stderr字段中获得正确的数据？我是否正确使用了Popen.communication？有没有更好的方法使用python（而不是jython）启动java执行 [1] 我用来在Hadoop中启动作业的代码片段 command=/home/xubuntu/Programs/hadoop/bin/hadoop jar /home/x

我正在用python启动一个MapReduce作业，使用代码[1]。问题是我在stderr字段[3]中获得了正确的输出数据，而不是在stdout字段[2]中。为什么我在stderr字段中获得正确的数据？我是否正确使用了

Popen.communication

？有没有更好的方法使用python（而不是jython）启动java执行

[1] 我用来在Hadoop中启动作业的代码片段

command=/home/xubuntu/Programs/hadoop/bin/hadoop jar /home/xubuntu/Programs/hadoop/medusa-java.jar mywordcount -Dfile.path=/home/xubuntu/Programs/medusa-2.0/temp/1443004585/job.attributes /input1 /output1

try:
    process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out,err = process.communicate()
    print ("Out %s" % out)
    print ("Error %s" % err)

    if len(err) > 0:  # there is an exception
        # print("Going to launch exception")
        raise ValueError("Exception:\n" + err)
except ValueError as e:
    return e.message

return out

[2] stdoutdata中的输出：

[2015-09-23 07:16:13,220: WARNING/Worker-17] Out My Setup
My get job name
My get job name
My get job name
org.apache.hadoop.mapreduce.lib.partition.HashPartitioner
---> Job 0: /input1, : /output1-1443006949
10.10.5.192
10.10.5.192:8032

[3] 标准数据字段中的输出：

[2015-09-23 07:16:13,221: WARNING/Worker-17] Error 15/09/23 07:15:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/23 07:15:53 INFO client.RMProxy: Connecting to ResourceManager at  /10.10.5.192:8032
15/09/23 07:15:54 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 07:15:54 INFO input.FileInputFormat: Total input paths to process : 4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: number of splits:4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442999930174_0009
15/09/23 07:15:54 INFO impl.YarnClientImpl: Submitted application application_1442999930174_0009
15/09/23 07:15:54 INFO mapreduce.Job: The url to track the job: http://hadoop-coc-1:9046/proxy/application_1442999930174_0009/
15/09/23 07:15:54 INFO mapreduce.Job: Running job: job_1442999930174_0009
15/09/23 07:16:00 INFO mapreduce.Job: Job job_1442999930174_0009 running in uber mode : false
15/09/23 07:16:00 INFO mapreduce.Job:  map 0% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job: Job job_1442999930174_0009 completed successfully
15/09/23 07:16:13 INFO mapreduce.Job: Counters: 30
    File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=423900
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=472
            HDFS: Number of bytes written=148
            HDFS: Number of read operations=20
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=8
    Job Counters 
            Launched map tasks=4
            Data-local map tasks=4
            Total time spent by all maps in occupied slots (ms)=41232
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=41232
            Total vcore-seconds taken by all map tasks=41232
            Total megabyte-seconds taken by all map tasks=42221568
    Map-Reduce Framework
            Map input records=34
            Map output records=34
            Input split bytes=406
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=532
            CPU time spent (ms)=1320
            Physical memory (bytes) snapshot=245039104
            Virtual memory (bytes) snapshot=1272741888
            Total committed heap usage (bytes)=65273856
    File Input Format Counters

Hadoop（特别是

Log4j

）只是将所有

[INFO]

消息记录到

stderr

。根据其配置：

默认情况下，Hadoop将消息记录到Log4j。Log4j是通过类路径上的Log4j.properties配置的。该文件定义了记录的内容和位置。对于应用程序，默认的根记录器是“INFO，console”，它将INFO及以上级别的所有消息记录到控制台的stderr中。服务器登录到“INFO，DRFA”，该日志记录到每天滚动的文件。日志文件名为$HADOOP_Log_DIR/HADOOP-$HADOOP_IDENT_STRING-.Log

我从来没有尝试过将日志重定向到

stdout

，因此我实在帮不上忙，另一位用户建议：

// Answer by Rajkumar Singh
// to get your stdout and log message on the console you can use apache
// commons logging framework in to your mapper and reducer.

public class MyMapper extends Mapper<..,...,..,...>{
public static final Log log = LogFactory.getLog(MyMapper.class)
public void map() throws Exception{
// Log to stdout file
System.out.println("Map key "+ key);

//log to the syslog file
log.info("Map key "+ key);

if(log.isDebugEanbled()){
log.debug("Map key "+ key);
}
context.write(key,value);
}

//Rajkumar Singh的回答
//要在控制台上获取标准输出和日志消息，可以使用apache
//将commons日志框架添加到映射器和reducer中。
公共类MyMapper扩展了Mapper{
公共静态最终日志日志=LogFactory.getLog（MyMapper.class）
public void map（）引发异常{
//记录到标准输出文件
System.out.println（“映射键”+键）；
//登录到syslog文件
日志信息（“映射键”+键）；
if（log.isDebugEanbled（））{
log.debug（“映射键”+键）；
}
编写（键、值）；
}

我建议试一试。

我很困惑。我只在你的标准调试信息中看到hadoop使用的是

System.err.println

？