Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services 如何将aws代理主机设置为Spark配置_Amazon Web Services_Apache Spark_Amazon S3 - Fatal编程技术网

Amazon web services 如何将aws代理主机设置为Spark配置

Amazon web services 如何将aws代理主机设置为Spark配置,amazon-web-services,apache-spark,amazon-s3,Amazon Web Services,Apache Spark,Amazon S3,了解如何将aws代理主机和区域设置为spark会话或spark上下文 我能够在aws javasdk代码中进行设置,并且工作正常 ClientConfiguration clientConfig = new ClientConfiguration(); clientConfig.setProxyHost("aws-proxy-qa.xxxxx.organization.com"); clientConfig.setProxyPort(8099));

了解如何将aws代理主机和区域设置为spark会话或spark上下文

我能够在aws javasdk代码中进行设置,并且工作正常

      ClientConfiguration clientConfig = new ClientConfiguration();
      clientConfig.setProxyHost("aws-proxy-qa.xxxxx.organization.com");
        clientConfig.setProxyPort(8099));

      AmazonS3ClientBuilder.standard()
        .withRegion(getAWSRegion(Regions.US_WEST_2)
        .withClientConfiguration(clientConfig) //Setting aws proxy host

可以帮助我将相同的内容设置为spark context(区域和代理),因为我正在读取的s3文件的区域与emr区域不同。

基于fs.s3a.access.key和fs.s3a.secret.key的区域将自动确定

就像其他s3属性一样 将此设置为sparkConf

/**
      * example getSparkSessionForS3
      * @return
      */
    def getSparkSessionForS3():SparkSession = {
  val conf = new SparkConf()
    .setAppName("testS3File")
    .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    .set("spark.hadoop.fs.s3a.endpoint", "yourendpoint")
    .set("spark.hadoop.fs.s3a.connection.maximum", "200")
    .set("spark.hadoop.fs.s3a.fast.upload", "true")
    .set("spark.hadoop.fs.s3a.connection.establish.timeout", "500")
    .set("spark.hadoop.fs.s3a.connection.timeout", "5000")
    .set("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2")
    .set("spark.hadoop.com.amazonaws.services.s3.enableV4", "true")
    .set("spark.hadoop.com.amazonaws.services.s3.enforceV4", "true")
    .set("spark.hadoop.fs.s3a.proxy.host","yourhost") 
  val spark = SparkSession
    .builder()
    .config(conf)
    .getOrCreate()
  spark
}

根据fs.s3a.access.key和fs.s3a.secret.key自动确定区域

就像其他s3属性一样 将此设置为sparkConf

/**
      * example getSparkSessionForS3
      * @return
      */
    def getSparkSessionForS3():SparkSession = {
  val conf = new SparkConf()
    .setAppName("testS3File")
    .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    .set("spark.hadoop.fs.s3a.endpoint", "yourendpoint")
    .set("spark.hadoop.fs.s3a.connection.maximum", "200")
    .set("spark.hadoop.fs.s3a.fast.upload", "true")
    .set("spark.hadoop.fs.s3a.connection.establish.timeout", "500")
    .set("spark.hadoop.fs.s3a.connection.timeout", "5000")
    .set("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2")
    .set("spark.hadoop.com.amazonaws.services.s3.enableV4", "true")
    .set("spark.hadoop.com.amazonaws.services.s3.enforceV4", "true")
    .set("spark.hadoop.fs.s3a.proxy.host","yourhost") 
  val spark = SparkSession
    .builder()
    .config(conf)
    .getOrCreate()
  spark
}
看起来不错,但是(a)您不需要fs.s3a.impl one和(b)我不认为那些com.amazonaws选项在s3a客户端中可以使用。hadoop s3a文档涵盖了使用v4签名(很快将成为强制性)的切换,看起来不错,但是(a)您不需要fs.s3a.impl one和(b)我认为s3a客户机中不会用到那些com.amazonaws选项。hadoop s3a文档涵盖了使用v4签名的切换(这将很快成为强制性的)