Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 从beam管道写入TFR记录?_Java_Tensorflow_Apache Beam_Tfrecord - Fatal编程技术网

Java 从beam管道写入TFR记录?

Java 从beam管道写入TFR记录?,java,tensorflow,apache-beam,tfrecord,Java,Tensorflow,Apache Beam,Tfrecord,我有一些Map格式的数据,我想使用beam管道将它们转换为tfrecords。 下面是我编写代码的尝试。我曾在python中尝试过这一点,但我需要在java中实现这一点,因为有些业务逻辑无法移植到python。在本文中可以找到相应的工作python实现 import com.google.protobuf.ByteString; 导入org.apache.beam.sdk.Pipeline; 导入org.apache.beam.sdk.extensions.protobuf.ProtoCoder

我有一些Map格式的数据,我想使用beam管道将它们转换为tfrecords。 下面是我编写代码的尝试。我曾在python中尝试过这一点,但我需要在java中实现这一点,因为有些业务逻辑无法移植到python。在本文中可以找到相应的工作python实现

import com.google.protobuf.ByteString;
导入org.apache.beam.sdk.Pipeline;
导入org.apache.beam.sdk.extensions.protobuf.ProtoCoder;
导入org.apache.beam.sdk.io.TFRecordIO;
导入org.apache.beam.sdk.transforms.Create;
导入org.apache.beam.sdk.transforms.DoFn;
导入org.apache.beam.sdk.transforms.ParDo;
导入org.apache.commons.lang3.RandomStringUtils;
导入org.tensorflow.example.BytesList;
导入org.tensorflow.example.example;
导入org.tensorflow.example.Feature;
导入org.tensorflow.example.Features;
导入java.nio.charset.StandardCharset;
导入java.util.ArrayList;
导入java.util.HashMap;
导入java.util.List;
导入java.util.Map;
导入java.util.stream.collector;
导入java.util.stream.IntStream;
公共类样本{
静态类Foo扩展了DoFn{
公共静态要素stringToFeature(字符串值){
ByteString ByteString=ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
BytesList BytesList=BytesList.newBuilder().addValue(byteString.build();
返回Feature.newBuilder();
}
public void processElement(@Element-Map-Element,OutputReceiver-receiver){
Features=Features.newBuilder()
.putFeature(“foo”,stringToFeature(element.get(“foo”))
.putFeature(“bar”,stringToFeature(element.get(“bar”))
.build();
例
.newBuilder()
.setFeatures(功能)
.build();
接收机输出(示例);
}
}
私有静态映射生成器记录(){
字符串[]键={“foo”,“bar”};
return IntStream.range(0,keys.length)
.boxed()
.收藏(收藏家)
.toMap(i->键[i],
i->RandomStringUtils.RandomStringUtils(8));
}
公共静态void main(字符串[]args){
列表记录=新的ArrayList();

对于(int i=0;i,您需要将输入到TFRecordIO的内容转换为byte[]

您可以通过使用像

static class StringToByteArray extends DoFn<String, byte[]> {
 @ProcessElement
 public void processElement(ProcessContext c) {
  c.output(c.element().getBytes(Charsets.UTF_8));
 }
} 
静态类StringToByteArray扩展DoFn{
@过程元素
公共void processElement(ProcessContext c){
c、 输出(c.element().getBytes(Charsets.UTF_8));
}
} 

输入到
TFRecordIO.write()
应该是
byte[]
,所以进行以下更改对我很有效

static class Foo extends DoFn<Map<String, String>, byte[]> {

    public static Feature stringToFeature(String value) {
        ByteString byteString = ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
        BytesList bytesList = BytesList.newBuilder().addValue(byteString).build();
        return Feature.newBuilder().setBytesList(bytesList).build();
    }

    public void processElement(@Element Map<String, String> element, OutputReceiver<byte[]> receiver) {

        Features features = Features.newBuilder()
                .putFeature("foo", stringToFeature(element.get("foo")))
                .putFeature("bar", stringToFeature(element.get("bar")))
                .build();

        Example example = Example
                .newBuilder()
                .setFeatures(features)
                .build();

        receiver.output(example.toByteArray());
    }

}
静态类Foo扩展DoFn{
公共静态要素stringToFeature(字符串值){
ByteString ByteString=ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
BytesList BytesList=BytesList.newBuilder().addValue(byteString.build();
返回Feature.newBuilder();
}
public void processElement(@Element-Map-Element,OutputReceiver-receiver){
Features=Features.newBuilder()
.putFeature(“foo”,stringToFeature(element.get(“foo”))
.putFeature(“bar”,stringToFeature(element.get(“bar”))
.build();
例
.newBuilder()
.setFeatures(功能)
.build();
receiver.output(例如.toByteArray());
}
}

这不是protcoder的一项工作,它处理protobuf消息的序列化。命令不会更改元素类型。它们仅在元素序列化、反序列化和类型检查时用于元素的有效编码和解码。如果检查TFRecordIO.Write的文档,它需要一个字节[]作为输入。作为参考,请查看以下文档,很高兴知道它是有效的。您能接受答案吗,因为它解决了问题,以帮助社区。@bruce_wayne,我有一种类似的要求,所以我试图编译您的代码,但得到编译时错误:错误:(66,82)java:不兼容类型:java.lang.Class无法转换为org.apache.beam.sdk.coders.Coder对此有什么想法吗?
processElement
的返回类型应该是字节,而不是对象,请检查下面我的答案。
static class StringToByteArray extends DoFn<String, byte[]> {
 @ProcessElement
 public void processElement(ProcessContext c) {
  c.output(c.element().getBytes(Charsets.UTF_8));
 }
} 
static class Foo extends DoFn<Map<String, String>, byte[]> {

    public static Feature stringToFeature(String value) {
        ByteString byteString = ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
        BytesList bytesList = BytesList.newBuilder().addValue(byteString).build();
        return Feature.newBuilder().setBytesList(bytesList).build();
    }

    public void processElement(@Element Map<String, String> element, OutputReceiver<byte[]> receiver) {

        Features features = Features.newBuilder()
                .putFeature("foo", stringToFeature(element.get("foo")))
                .putFeature("bar", stringToFeature(element.get("bar")))
                .build();

        Example example = Example
                .newBuilder()
                .setFeatures(features)
                .build();

        receiver.output(example.toByteArray());
    }

}