Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 从AWS s3桶读取拼花地板数据_Java_Amazon Web Services_Amazon S3_Parquet - Fatal编程技术网

Java 从AWS s3桶读取拼花地板数据

Java 从AWS s3桶读取拼花地板数据,java,amazon-web-services,amazon-s3,parquet,Java,Amazon Web Services,Amazon S3,Parquet,我需要从aws s3读取拼花地板数据。如果为此使用aws sdk,我可以获得如下inputstream: S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, bucketKey)); InputStream inputStream = object.getObjectContent(); ParquetReader<Group> reader = Parq

我需要从aws s3读取拼花地板数据。如果为此使用aws sdk,我可以获得如下inputstream:

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, bucketKey));
InputStream inputStream = object.getObjectContent();
ParquetReader<Group> reader =
                    ParquetReader.builder(new GroupReadSupport(), new Path(file.getAbsolutePath()))
                            .withConf(conf)
                            .build();
reader.read()
但是apache parquet reader仅使用如下本地文件:

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, bucketKey));
InputStream inputStream = object.getObjectContent();
ParquetReader<Group> reader =
                    ParquetReader.builder(new GroupReadSupport(), new Path(file.getAbsolutePath()))
                            .withConf(conf)
                            .build();
reader.read()
ParquetReader阅读器=
ParquetReader.builder(新GroupReadSupport(),新路径(file.getAbsolutePath())
.withConf(conf)
.build();
reader.read()
所以我不知道如何解析拼花文件的输入流。 例如,对于csv文件,有一个使用inputstream的CSVParser

我知道使用spark实现此目标的解决方案。 像这样:

SparkSession spark = SparkSession
                .builder()
                .getOrCreate();
Dataset<Row> ds = spark.read().parquet("s3a://bucketName/file.parquet");
SparkSession spark=SparkSession
.builder()
.getOrCreate();
数据集ds=spark.read().parquet(“s3a://bucketName/file.parquet”);
但我不能用spark


有谁能告诉我从s3读取拼花地板数据的解决方案吗?

谢谢。我们如何为此代码提供aws凭据?我得到:com.amazonaws.AmazonClientException:BasicaWSCredentialsProvider没有提供AWS凭据回答我自己的问题:我已将访问密钥和密钥设置为:Configuration conf=new Configuration();conf.set(“fs.s3a.access.key”,“xxxxxxxxxxxx”);conf.set(“fs.s3a.secret.key”,“xxxxxxxxxxxxxxxxxxxx”);只是更新:
AvroParquetReader.builder(path)
已被弃用,将在2.0.0版中删除
String SCHEMA_TEMPLATE = "{" +
                        "\"type\": \"record\",\n" +
                        "    \"name\": \"schema\",\n" +
                        "    \"fields\": [\n" +
                        "        {\"name\": \"timeStamp\", \"type\": \"string\"},\n" +
                        "        {\"name\": \"temperature\", \"type\": \"double\"},\n" +
                        "        {\"name\": \"pressure\", \"type\": \"double\"}\n" +
                        "    ]" +
                        "}";
String PATH_SCHEMA = "s3a";
Path internalPath = new Path(PATH_SCHEMA, bucketName, folderName);
Schema schema = new Schema.Parser().parse(SCHEMA_TEMPLATE);
Configuration configuration = new Configuration();
AvroReadSupport.setRequestedProjection(configuration, schema);
ParquetReader<GenericRecord> = AvroParquetReader.GenericRecord>builder(internalPath).withConf(configuration).build();
GenericRecord genericRecord = parquetReader.read();

while(genericRecord != null) {
        Map<String, String> valuesMap = new HashMap<>();
        genericRecord.getSchema().getFields().forEach(field -> valuesMap.put(field.name(), genericRecord.get(field.name()).toString()));

        genericRecord = parquetReader.read();
}