Apache pig 亚马逊EMR-4上Tez上的跑步猪

Apache pig 亚马逊EMR-4上Tez上的跑步猪,apache-pig,amazon-emr,apache-tez,Apache Pig,Amazon Emr,Apache Tez,我试图在亚马逊EMR4.5.0上运行tez上的pig。配置在没有tez的情况下可以工作,我只是想让它在tez上工作 要创建集群(从命令行),我们使用(TEZ_版本定义为0.5.2): 此外,我正在覆盖PIG_类路径: --configurations file://pig_tez_classification.json 包含: [ { "Classification": "hadoop-env", "Properties": { }, "Configura

我试图在亚马逊EMR4.5.0上运行tez上的pig。配置在没有tez的情况下可以工作,我只是想让它在tez上工作

要创建集群(从命令行),我们使用(TEZ_版本定义为0.5.2):

此外,我正在覆盖PIG_类路径:

--configurations file://pig_tez_classification.json
包含:

[
  {
    "Classification": "hadoop-env",
    "Properties": {

    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "PIG_CLASSPATH": "/etc/tez/conf"
        },
        "Configurations": [

        ]
      }
    ]
  }
]
需要PIG_类路径来防止此错误:

ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Cannot submit DAG
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob (PigTezLauncher-0): Cannot submit DAG
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-3-207.eu-west-1.compute.internal:8020/apps/tez/0.5.2/tez-0.5.2-minimal.tar.gz
WARN  org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Exception while gathering stats
java.lang.NullPointerException
    at org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:191)
    at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167)
需要tez.lib.uris覆盖以防止此错误:

ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Cannot submit DAG
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob (PigTezLauncher-0): Cannot submit DAG
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-3-207.eu-west-1.compute.internal:8020/apps/tez/0.5.2/tez-0.5.2-minimal.tar.gz
WARN  org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Exception while gathering stats
java.lang.NullPointerException
    at org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:191)
    at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167)
安装脚本似乎将tar.gz文件写入hdfs中的正确位置,但当我随后通过ssh登录时,该文件不在那里。我认为在EMR-4中,引导操作在不同的时间运行,所以在hdfs可用之前

在这一切之后,我仍然得到了这个错误:

ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Cannot submit DAG
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob (PigTezLauncher-0): Cannot submit DAG
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-3-207.eu-west-1.compute.internal:8020/apps/tez/0.5.2/tez-0.5.2-minimal.tar.gz
WARN  org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Exception while gathering stats
java.lang.NullPointerException
    at org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:191)
    at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194)
    at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167)
尝试使用tez版本0.8.2可以获得:

ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - Cannot submit DAG
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown
这似乎是因为使用过的tez版本有所不同,因为它仍然打印:

INFO  org.apache.pig.tools.pigstats.tez.TezPigScriptStats  - Script Statistics:

       HadoopVersion: 2.7.2-amzn-0                                                                                        
          PigVersion: 0.14.0-amzn-0                                                                                       
          TezVersion: 0.5.2                                                                                               
              UserId: hadoop

那么,有人知道如何在amazon emr(任何版本)上运行Tig on tez吗?

我用emr-4.4.0成功地运行了它。然而,我无法让pig正确使用上传到HDFS的tar.gz。相反,我必须解包tarball并上传所有单独的文件,然后将tez.lib.uris设置为hdfs:///apps/tez-0.8.2,hdfs:///apps/tez-0.8.2/lib我正在使用emr-4.4.0成功运行它。然而,我无法让pig正确使用上传到HDFS的tar.gz。相反,我必须解包tarball并上传所有单独的文件,然后将tez.lib.uris设置为hdfs:///apps/tez-0.8.2,hdfs:///apps/tez-0.8.2/lib

看起来亚马逊提供的install-tez.rb脚本(s3://support.elasticmapreduce/tez/bigtop/install-tez.rb)也会解包tarball,但只会将tar.gz上传到hdfs(查看install_tez函数)。我将尝试您的版本和方法。它看起来像亚马逊提供的install-tez.rb脚本(s3://support.elasticmapreduce/tez/bigtop/install-tez.rb)也会解包tarball,但只会将tar.gz上传到hdfs(查看install_tez函数)。我将尝试您的版本和方法。