Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 为什么在本地模式下运行spark时,我需要使用DataFrames API执行读取以通过AWS验证?_Java_Amazon Web Services_Apache Spark_Amazon S3 - Fatal编程技术网

Java 为什么在本地模式下运行spark时,我需要使用DataFrames API执行读取以通过AWS验证?

Java 为什么在本地模式下运行spark时,我需要使用DataFrames API执行读取以通过AWS验证?,java,amazon-web-services,apache-spark,amazon-s3,Java,Amazon Web Services,Apache Spark,Amazon S3,此代码工作并通过: public class Test { public static void main(String[] args) throws IOException { AWSCredentials h = new AWSCredentials(); SparkConf conf = new SparkConf() .setMaster("local[*]") .setAppName

此代码工作并通过:

public class Test {
    public static void main(String[] args) throws IOException {
        AWSCredentials h = new AWSCredentials();
        SparkConf conf = new SparkConf()
                .setMaster("local[*]")
                .setAppName("Test")
                .set("fs.s3a.access.key", h.access_key_id)
                .set("fs.s3a.secret.key", h.secret_access_key);
        if (h.session_token != null) {
            conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider");
            conf.set("fs.s3a.session.token", h.session_token);
        }
        SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
        long count = spark.read().text("s3a://mybucket/path-to-files/file+9+0000000223.bin").javaRDD().count();
        System.out.println("count from scala spark is: " + count);
        JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());

        JavaRDD<String> maxwellRdd = sc.textFile("s3a://mybucket/path-to-files/*");
        System.out.println("count is: " + maxwellRdd.count());

        sc.stop();
    }
}

我不相信你的第一个有效——更具体地说,如果它确实有效,那是因为有人从环境变量或EC2 IAM设置中选取了你的凭据

如果您试图在spark conf中设置s3a选项,则需要在每个选项前面加上
“spark.hadoop”。


简单测试:创建spark上下文后,调用
sc.hadoopConfiguration
并在那里查找选项(这些都是在
org.apache.hadoop.fs.s3a.Constants
中定义的,如果你想100%确定你没有任何打字错误。

第一个确实有效,因为它读取文件,大概是因为它是通过~/.aws/凭证读取的,我想这似乎很奇怪(没有默认配置文件).Aaaanyway您完全正确,选项需要用spark.hadoop预先添加。这就解决了问题。干杯。我怀疑它是否会从~/aws/credentials获得,但是如果您设置了aws_uenvvars,spark submit会自动获取它们并将它们转换为fs.s3n/s3a属性。很高兴看到一切正常
public class Test {
    public static void main(String[] args) throws IOException {
        AWSCredentials h = new AWSCredentials();
        SparkConf conf = new SparkConf()
                .setMaster("local[*]")
                .setAppName("Test")
                .set("fs.s3a.access.key", h.access_key_id)
                .set("fs.s3a.secret.key", h.secret_access_key);
        if (h.session_token != null) {
            conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider");
            conf.set("fs.s3a.session.token", h.session_token);
        }
        SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
        //long count = spark.read().text("s3a://mybucket/path-to-files/file+9+0000000223.bin").javaRDD().count();
        //System.out.println("count from scala spark is: " + count);
        JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());

        JavaRDD<String> maxwellRdd = sc.textFile("s3a://mybucket/path-to-files/*");
        System.out.println("count is: " + maxwellRdd.count());

        sc.stop();
    }
}
dependencies {
    compile group: 'org.ini4j', name: 'ini4j', version: '0.5.4'
    compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.8'
    compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.2.1'
    compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '2.8.3'
    //compile group: 'com.amazonaws', name: 'aws-java-sdk', version: '1.11.313'
    testCompile group: 'junit', name: 'junit', version: '4.12'
}