Java 从beam管道写入TFR记录?
我有一些Map格式的数据,我想使用beam管道将它们转换为tfrecords。 下面是我编写代码的尝试。我曾在python中尝试过这一点,但我需要在java中实现这一点,因为有些业务逻辑无法移植到python。在本文中可以找到相应的工作python实现Java 从beam管道写入TFR记录?,java,tensorflow,apache-beam,tfrecord,Java,Tensorflow,Apache Beam,Tfrecord,我有一些Map格式的数据,我想使用beam管道将它们转换为tfrecords。 下面是我编写代码的尝试。我曾在python中尝试过这一点,但我需要在java中实现这一点,因为有些业务逻辑无法移植到python。在本文中可以找到相应的工作python实现 import com.google.protobuf.ByteString; 导入org.apache.beam.sdk.Pipeline; 导入org.apache.beam.sdk.extensions.protobuf.ProtoCoder
import com.google.protobuf.ByteString;
导入org.apache.beam.sdk.Pipeline;
导入org.apache.beam.sdk.extensions.protobuf.ProtoCoder;
导入org.apache.beam.sdk.io.TFRecordIO;
导入org.apache.beam.sdk.transforms.Create;
导入org.apache.beam.sdk.transforms.DoFn;
导入org.apache.beam.sdk.transforms.ParDo;
导入org.apache.commons.lang3.RandomStringUtils;
导入org.tensorflow.example.BytesList;
导入org.tensorflow.example.example;
导入org.tensorflow.example.Feature;
导入org.tensorflow.example.Features;
导入java.nio.charset.StandardCharset;
导入java.util.ArrayList;
导入java.util.HashMap;
导入java.util.List;
导入java.util.Map;
导入java.util.stream.collector;
导入java.util.stream.IntStream;
公共类样本{
静态类Foo扩展了DoFn{
公共静态要素stringToFeature(字符串值){
ByteString ByteString=ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
BytesList BytesList=BytesList.newBuilder().addValue(byteString.build();
返回Feature.newBuilder();
}
public void processElement(@Element-Map-Element,OutputReceiver-receiver){
Features=Features.newBuilder()
.putFeature(“foo”,stringToFeature(element.get(“foo”))
.putFeature(“bar”,stringToFeature(element.get(“bar”))
.build();
例
.newBuilder()
.setFeatures(功能)
.build();
接收机输出(示例);
}
}
私有静态映射生成器记录(){
字符串[]键={“foo”,“bar”};
return IntStream.range(0,keys.length)
.boxed()
.收藏(收藏家)
.toMap(i->键[i],
i->RandomStringUtils.RandomStringUtils(8));
}
公共静态void main(字符串[]args){
列表记录=新的ArrayList();
对于(int i=0;i,您需要将输入到TFRecordIO的内容转换为byte[]
您可以通过使用像
static class StringToByteArray extends DoFn<String, byte[]> {
@ProcessElement
public void processElement(ProcessContext c) {
c.output(c.element().getBytes(Charsets.UTF_8));
}
}
静态类StringToByteArray扩展DoFn{
@过程元素
公共void processElement(ProcessContext c){
c、 输出(c.element().getBytes(Charsets.UTF_8));
}
}
输入到TFRecordIO.write()
应该是byte[]
,所以进行以下更改对我很有效
static class Foo extends DoFn<Map<String, String>, byte[]> {
public static Feature stringToFeature(String value) {
ByteString byteString = ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
BytesList bytesList = BytesList.newBuilder().addValue(byteString).build();
return Feature.newBuilder().setBytesList(bytesList).build();
}
public void processElement(@Element Map<String, String> element, OutputReceiver<byte[]> receiver) {
Features features = Features.newBuilder()
.putFeature("foo", stringToFeature(element.get("foo")))
.putFeature("bar", stringToFeature(element.get("bar")))
.build();
Example example = Example
.newBuilder()
.setFeatures(features)
.build();
receiver.output(example.toByteArray());
}
}
静态类Foo扩展DoFn{
公共静态要素stringToFeature(字符串值){
ByteString ByteString=ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
BytesList BytesList=BytesList.newBuilder().addValue(byteString.build();
返回Feature.newBuilder();
}
public void processElement(@Element-Map-Element,OutputReceiver-receiver){
Features=Features.newBuilder()
.putFeature(“foo”,stringToFeature(element.get(“foo”))
.putFeature(“bar”,stringToFeature(element.get(“bar”))
.build();
例
.newBuilder()
.setFeatures(功能)
.build();
receiver.output(例如.toByteArray());
}
}
这不是protcoder的一项工作,它处理protobuf消息的序列化。命令不会更改元素类型。它们仅在元素序列化、反序列化和类型检查时用于元素的有效编码和解码。如果检查TFRecordIO.Write的文档,它需要一个字节[]作为输入。作为参考,请查看以下文档,很高兴知道它是有效的。您能接受答案吗,因为它解决了问题,以帮助社区。@bruce_wayne,我有一种类似的要求,所以我试图编译您的代码,但得到编译时错误:错误:(66,82)java:不兼容类型:java.lang.Class无法转换为org.apache.beam.sdk.coders.Coder对此有什么想法吗?processElement
的返回类型应该是字节,而不是对象,请检查下面我的答案。
static class StringToByteArray extends DoFn<String, byte[]> {
@ProcessElement
public void processElement(ProcessContext c) {
c.output(c.element().getBytes(Charsets.UTF_8));
}
}
static class Foo extends DoFn<Map<String, String>, byte[]> {
public static Feature stringToFeature(String value) {
ByteString byteString = ByteString.copyFrom(value.getBytes(StandardCharsets.UTF_8));
BytesList bytesList = BytesList.newBuilder().addValue(byteString).build();
return Feature.newBuilder().setBytesList(bytesList).build();
}
public void processElement(@Element Map<String, String> element, OutputReceiver<byte[]> receiver) {
Features features = Features.newBuilder()
.putFeature("foo", stringToFeature(element.get("foo")))
.putFeature("bar", stringToFeature(element.get("bar")))
.build();
Example example = Example
.newBuilder()
.setFeatures(features)
.build();
receiver.output(example.toByteArray());
}
}