Java如何在Intellij中找到spark、Hadoop和AWS JAR
我正在用Java在IntelliJ上运行spark应用程序。我在pom.xml中添加了spark、Hadoop和AWS依赖项,但不知怎的,AWS凭据没有加载 我得到的确切错误是由以下原因引起的:Java如何在Intellij中找到spark、Hadoop和AWS JAR,java,amazon-web-services,maven,apache-spark,intellij-idea,Java,Amazon Web Services,Maven,Apache Spark,Intellij Idea,我正在用Java在IntelliJ上运行spark应用程序。我在pom.xml中添加了spark、Hadoop和AWS依赖项,但不知怎的,AWS凭据没有加载 我得到的确切错误是由以下原因引起的::com.amazonaws.AmazonClientException:BasicAWSCredentialsProvider环境VariableCredentialsProvider InstanceProfileCredentialsProvider没有提供AWS凭据:com.amazonaws.s
:com.amazonaws.AmazonClientException:BasicAWSCredentialsProvider环境VariableCredentialsProvider InstanceProfileCredentialsProvider没有提供AWS凭据:com.amazonaws.sdkClientxception:无法从服务端点加载凭据
下面是my.java和pom.xml文件
SparkSession spark = SparkSession
.builder()
.master("local") .config("spark.hadoop.fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem") .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2")
.config("spark.hadoop.fs.s3a.awsAccessKeyId", AWS_KEY)
.config("spark.hadoop.fs.s3a.awsSecretAccessKey", AWS_SECRET_KEY)
.getOrCreate();
JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
Dataset<Row> dF = spark.read().load("s3a://bucket/abc.parquet");
SparkSession spark=SparkSession
.builder()
.master(“local”).config(“spark.hadoop.fs.s3a.impl”、“org.apache.hadoop.fs.s3a.S3AFileSystem”).config(“spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version”、“2”)
.config(“spark.hadoop.fs.s3a.awsAccessKeyId”,AWS_KEY)
.config(“spark.hadoop.fs.s3a.awsSecretAccessKey”,AWS_SECRET_KEY)
.getOrCreate();
JavaSparkContext sc=新的JavaSparkContext(spark.sparkContext());
数据集dF=spark.read().load(“s3a://bucket/abc.parquet”);
这是我的pom.xml
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.417</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>
org.apache.spark
spark-core_2.11
2.3.2
org.apache.spark
spark-sql_2.11
2.3.2
亚马逊网站
aws java sdk
1.11.417
org.apache.hadoop
hadoop aws
3.1.1
org.apache.hadoop
hadoop hdfs
3.1.1
org.apache.hadoop
hadoop通用
3.1.1
我被困在这一段时间,并尝试了所有可用的解决方案。我在我的环境中添加了导出AWS密钥
考虑到没有像python或Scala这样的java spark shell,pom.xml是唯一的方法,您还有其他方法为java指定JAR或键吗?发现您必须只在SparkContext中添加AWS creds,而不是SparkSession
JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
sc.hadoopConfiguration().set("fs.s3a.access.key", AWS_KEY);
sc.hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET_KEY);
要点是将AWS凭据添加到Hadoop配置中。 无需创建单独的JavaSparkContext。您可以通过以下方式直接修改sparkContext并添加AWS凭据:
SparkSession spark = SparkSession.builder()
.master("local")
.appName("AWSFileRead")
.getOrCreate();
spark.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", AWS_KEY);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET_KEY);
Dataset<Row> awsFileDataset = spark.read().option("header", "true")
.csv("s3a://your_bucket_name/file_name.csv");
SparkSession spark=SparkSession.builder()
.master(“本地”)
.appName(“AWSFileRead”)
.getOrCreate();
spark.sparkContext().hadoopConfiguration().set(“fs.s3a.access.key”,AWS_key);
spark.sparkContext().hadoopConfiguration().set(“fs.s3a.secret.key”,AWS_secret_key);
Dataset awsFileDataset=spark.read().option(“头”、“真”)
.csv(“s3a://your_bucket_name/file_name.csv”);