Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/394.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 问:将Avro转换为记忆中的拼花地板_Java_Hadoop_Avro_Parquet - Fatal编程技术网

Java 问:将Avro转换为记忆中的拼花地板

Java 问:将Avro转换为记忆中的拼花地板,java,hadoop,avro,parquet,Java,Hadoop,Avro,Parquet,我收到卡夫卡的Avro唱片。我想把这些记录转换成拼花文件。以下是我的博客: 到目前为止,代码大致如下所示: final String fileName SinkRecord record, final AvroData avroData final Schema avroSchema = avroData.fromConnectSchema(record.valueSchema()); CompressionCodecName compressionCodecName = Compressi

我收到卡夫卡的Avro唱片。我想把这些记录转换成拼花文件。以下是我的博客:

到目前为止,代码大致如下所示:

final String fileName
SinkRecord record, 
final AvroData avroData

final Schema avroSchema = avroData.fromConnectSchema(record.valueSchema());
CompressionCodecName compressionCodecName = CompressionCodecName.SNAPPY;

int blockSize = 256 * 1024 * 1024;
int pageSize = 64 * 1024;

Path path = new Path(fileName);
writer = new AvroParquetWriter<>(path, avroSchema, compressionCodecName, blockSize, pageSize);
最终字符串文件名
创纪录,
最终AvroData AvroData
最终模式avroSchema=avroData.fromConnectSchema(record.valueSchema());
CompressionCodecName CompressionCodecName=CompressionCodecName.SNAPPY;
int blockSize=256*1024*1024;
int pageSize=64*1024;
路径路径=新路径(文件名);
writer=新的AvroParquetWriter(路径、avroSchema、压缩编码名称、块大小、页面大小);
现在,这将完成Avro到拼花地板的转换,但它会将拼花文件写入磁盘。我想知道是否有更简单的方法将文件保存在内存中,这样我就不必管理磁盘上的临时文件。多谢各位

"but it will write the Parquet file to the disk"
"if there was an easier way to just keep the file in memory"
从您的查询中,我了解到您不想将部分文件写入parquet。如果您希望将完整文件以拼花格式写入磁盘,并将临时文件写入内存,则可以使用内存映射文件和拼花格式的组合

将数据写入内存映射文件,完成写入后,将字节转换为拼花格式并存储到磁盘

看看。

请查看我的博客,必要时翻译成英文

package yanbin.blog;
 
import org.apache.parquet.io.DelegatingPositionOutputStream;
import org.apache.parquet.io.OutputFile;
import org.apache.parquet.io.PositionOutputStream;
 
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
 
public class InMemoryOutputFile implements OutputFile {
    private final ByteArrayOutputStream baos = new ByteArrayOutputStream();
 
    @Override
    public PositionOutputStream create(long blockSizeHint) throws IOException { // Mode.CREATE calls this method
        return new InMemoryPositionOutputStream(baos);
    }
 
    @Override
    public PositionOutputStream createOrOverwrite(long blockSizeHint) throws IOException {
        return null;
    }
 
    @Override
    public boolean supportsBlockSize() {
        return false;
    }
 
    @Override
    public long defaultBlockSize() {
        return 0;
    }
 
    public byte[] toArray() {
        return baos.toByteArray();
    }
 
    private static class InMemoryPositionOutputStream extends DelegatingPositionOutputStream {
 
        public InMemoryPositionOutputStream(OutputStream outputStream) {
            super(outputStream);
        }
 
        @Override
        public long getPos() throws IOException {
            return ((ByteArrayOutputStream) this.getStream()).size();
        }
    }
}
    public static <T extends SpecificRecordBase> void writeToParquet(List<T> avroObjects) throws IOException {
        Schema avroSchema = avroObjects.get(0).getSchema();
        GenericData genericData = GenericData.get();
        genericData.addLogicalTypeConversion(new TimeConversions.DateConversion());
        InMemoryOutputFile outputFile = new InMemoryOutputFile();
        try (ParquetWriter<Object> writer = AvroParquetWriter.builder(outputFile)
                .withDataModel(genericData)
                .withSchema(avroSchema)
                .withCompressionCodec(CompressionCodecName.SNAPPY)
                .withWriteMode(ParquetFileWriter.Mode.CREATE)
                .build()) {
            avroObjects.forEach(r -> {
                try {
                    writer.write(r);
                } catch (IOException ex) {
                    throw new UncheckedIOException(ex);
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
 
        // dump memory data to file for testing
        Files.write(Paths.get("./users-memory.parquet"), outputFile.toArray());
    }
$ parquet-tools cat --json users-memory.parquet
$ parquet-tools schema users-memory.parquet