Java 如何为org.apache.parquet.avro.AvroParquetReader配置S3访问权限?
我为此挣扎了一段时间,想与大家分享我的解决方案。AvroParquetReader是一个很好的读取拼花地板的工具,但其S3访问的默认值很弱:Java 如何为org.apache.parquet.avro.AvroParquetReader配置S3访问权限?,java,amazon-s3,parquet,Java,Amazon S3,Parquet,我为此挣扎了一段时间,想与大家分享我的解决方案。AvroParquetReader是一个很好的读取拼花地板的工具,但其S3访问的默认值很弱: java.io.InterruptedIOException: doesBucketExist on MY_BUCKET: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCr
java.io.InterruptedIOException: doesBucketExist on MY_BUCKET: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.AmazonClientException: Unable to load credentials from service endpoint
我想使用与com.amazonaws.auth.profile.ProfileCredentialsProvider类似的凭据提供程序,它用于访问我的S3存储桶,但从AvroParquetReader的类定义或文档中不清楚我将如何实现这一点。这段代码适合我。它允许AvroParquetReader使用ProfileCredentialsProvider访问S3
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.hadoop.fs.Path;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.conf.Configuration;
...
final String path = "s3a://"+bucketName+"/"+pathName;
final Configuration configuration = new Configuration();
configuration.setClass("fs.s3a.aws.credentials.provider", ProfileCredentialsProvider.class,
AWSCredentialsProvider.class);
ParquetReader<GenericRecord> parquetReader =
AvroParquetReader.<GenericRecord>builder(new Path(path)).withConf(configuration).build();
import com.amazonaws.auth.AWSCredentialsProvider;
导入com.amazonaws.auth.profile.ProfileCredentialsProvider;
导入org.apache.parquet.avro.AvroParquetReader;
导入org.apache.parquet.hadoop.ParquetReader;
导入org.apache.hadoop.fs.Path;
导入org.apache.avro.generic.GenericRecord;
导入org.apache.hadoop.conf.Configuration;
...
最终字符串路径=“s3a://”+bucketName+“/”+路径名;
最终配置=新配置();
setClass(“fs.s3a.aws.credentials.provider”,ProfileCredentialsProvider.class,
AWSCredentialsProvider.class);
镶木机镶木机=
AvroParquetReader.builder(新路径(Path)).withConf(配置).build();
这个代码对我有用。它允许AvroParquetReader使用ProfileCredentialsProvider访问S3
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.hadoop.fs.Path;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.conf.Configuration;
...
final String path = "s3a://"+bucketName+"/"+pathName;
final Configuration configuration = new Configuration();
configuration.setClass("fs.s3a.aws.credentials.provider", ProfileCredentialsProvider.class,
AWSCredentialsProvider.class);
ParquetReader<GenericRecord> parquetReader =
AvroParquetReader.<GenericRecord>builder(new Path(path)).withConf(configuration).build();
import com.amazonaws.auth.AWSCredentialsProvider;
导入com.amazonaws.auth.profile.ProfileCredentialsProvider;
导入org.apache.parquet.avro.AvroParquetReader;
导入org.apache.parquet.hadoop.ParquetReader;
导入org.apache.hadoop.fs.Path;
导入org.apache.avro.generic.GenericRecord;
导入org.apache.hadoop.conf.Configuration;
...
最终字符串路径=“s3a://”+bucketName+“/”+路径名;
最终配置=新配置();
setClass(“fs.s3a.aws.credentials.provider”,ProfileCredentialsProvider.class,
AWSCredentialsProvider.class);
镶木机镶木机=
AvroParquetReader.builder(新路径(Path)).withConf(配置).build();
对于其他遇到此问题的人,我发现@jd_free answer不适合我。我需要更改的唯一一件事就是将有关所使用的AWSCredentialsProvider类型的配置设置传递给AvroParquetReader
:
Configuration configuration = new Configuration();
configuration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider");
configuration.set("fs.s3a.access.key", "KEY");
configuration.set("fs.s3a.secret.key", "KEY");`
问题在于提供的凭据以及提供给配置的方式。有关不同凭据提供程序的更多信息,请使用“签出”。它解释了可用于不同场景的不同类型,包括如何从环境变量中获取凭据。对于其他遇到此问题的人,我发现@jd_free answer对我不起作用。我需要更改的唯一一件事就是将有关所使用的AWSCredentialsProvider类型的配置设置传递给AvroParquetReader
:
Configuration configuration = new Configuration();
configuration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider");
configuration.set("fs.s3a.access.key", "KEY");
configuration.set("fs.s3a.secret.key", "KEY");`
问题在于提供的凭据以及提供给配置的方式。有关不同凭据提供程序的更多信息,请使用“签出”。它解释了可用于不同场景的不同类型,包括如何从环境变量中获取凭据