Python 通过matplotlib从apache pig打印数据
因此,我尝试使用python/matplotlib通过ApachePig绘制一些数据 具体地说,我希望使用pig读取和处理数据,然后通过用python编写的绘图脚本将数据流化 我在ApachePig之外使用绘图脚本已经有一段时间了,没有发生任何事件,所以我很确定这不是问题所在,但是如果有人想让我发布,我可以发布它 这是我的猪剧本Python 通过matplotlib从apache pig打印数据,python,matplotlib,streaming,apache-pig,Python,Matplotlib,Streaming,Apache Pig,因此,我尝试使用python/matplotlib通过ApachePig绘制一些数据 具体地说,我希望使用pig读取和处理数据,然后通过用python编写的绘图脚本将数据流化 我在ApachePig之外使用绘图脚本已经有一段时间了,没有发生任何事件,所以我很确定这不是问题所在,但是如果有人想让我发布,我可以发布它 这是我的猪剧本 %default BINSIZE 5.0 /* functions */ define plot `test_plot.py -f output_image.png`
%default BINSIZE 5.0
/* functions */
define plot `test_plot.py -f output_image.png` ship('/tank/user/eric/dev/pig/test_plot.py');
/* load the data */
cd /scratch;
VALUE = load 'test_data.txt' as (x_val:double);
/* bin the data */
BINNED_VAL = foreach VALUE
generate (double)((int)( x_val / $BINSIZE )) * $BINSIZE;
/* make a histogram */
COUNTED = group BINNED_VAL by $0;
HIST = foreach COUNTED generate group, COUNT(BINNED_VAL);
A = stream HIST through plot;
dump A;
test_plot.py的-f标志指定输出文件。脚本从stdin读取数据,但不向stdout写入数据,因此A实际上从未设置为任何值,这意味着dump A实际上不做任何事情。并确实抛出了一个错误
以下是test_data.txt的内容:
5
5
6
6.5
8
12
28
25
25
25
26
29
32
35
下面是我收到的错误消息:
2014-07-07 12:49:30,973 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
2014-07-07 12:49:30,973 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-07-07 12:49:30,974 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0.2.1.2.1-471 0.12.1.2.1.2.1-471 eric 2014-07-07 12:48:57 2014-07-07 12:49:30 GROUP_BY,STREAMING
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1404713698289_0021 A,BINNED_VAL,COUNTED,HIST,VALUE GROUP_BY,STREAMING,COMBINER Message: Job failed! hdfs://hypno.st.hmc.edu:8020/tmp/temp-2122498041/tmp461187682,
Input(s):
Failed to read data from "hdfs://hypno.st.hmc.edu:8020/scratch/test_data.txt"
Output(s):
Failed to produce result in "hdfs://hypno.st.hmc.edu:8020/tmp/temp-2122498041/tmp461187682"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1404713698289_0021
2014-07-07 12:49:30,974 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-07-07 12:49:30,986 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
Details at logfile: /tank/user/eric/dev/pig/pig_1404762535492.log
这是输出日志文件:
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_m_000000_0 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_0 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_0 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_1 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_1 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_2 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_2 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1404713698289_0021_r_000000_3 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Error message from task (reduce) task_1404713698289_0021_r_000000
-----------------------------------------------------------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
Pig Stack Trace
---------------
ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.PigServer.openIterator(PigServer.java:872)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:607)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received Error while processing the reduce plan: 'test_plot.py (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' failed with exit status: 1
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:496)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.cleanup(PigGenericMapReduce.java:522)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
================================================================================
我的pig版本是ApachePig版本0.12.1.2.1.2.1-471,我使用的是Python 2.6.6
我对猪也很陌生,所以如果我错过了一些愚蠢的东西,我向你道歉
如果有人能给我指出正确的方向,我将不胜感激 您的所有节点上都安装了Python吗?我建议从本地模式开始,先让它工作。另外,我不确定转储PNG文件是否有效……是的,这是一个安装了Python2.6.6的单节点/测试安装。我也没有尝试转储PNG,而是尝试从python脚本中向本地文件系统写入PNG。垃圾可能是我的问题的一部分,因为它实际上什么也没倒。有没有一种方法可以在不转储的情况下结束脚本?你能发布你的python脚本吗?也许从一个非常简单的python脚本开始,它只向stdout写一行代码来找出错误在哪里顺便问一下,pig会给数据添加括号,你的python脚本能处理吗?