Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/arduino/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Dataframe 在Java中从Spark数据帧中提取二进制数据_Dataframe_Apache Spark_Apache Spark Sql_Spark Java - Fatal编程技术网

Dataframe 在Java中从Spark数据帧中提取二进制数据

Dataframe 在Java中从Spark数据帧中提取二进制数据,dataframe,apache-spark,apache-spark-sql,spark-java,Dataframe,Apache Spark,Apache Spark Sql,Spark Java,我有一个具有以下模式的数据框架 root |-- blob: binary (nullable = true) 数据如下所示 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

我有一个具有以下模式的数据框架

root
 |-- blob: binary (nullable = true)
数据如下所示

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|blob                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[1F 8B 08 00 00 00 00 00 00 00 7B C0 C8 F0 EA 1D E3 FC C6 E6 7B 8C B3 DA F7 31 4E EB 3B 94 9C 9F AB 97 98 9B 58 95 9F A7 97 59 92 9A 1B 9F 9B 58 5C 92 5A A4 17 94 9A 93 58 92 99 9F 57 9C 91 59 10 1F 90 58 94 9A 57 12 1F 92 1F EF 9C 91 99 93 12 1F 9E 59 92 11 EF 92 9A 53 92 E8 60 A8 67 D0 99 92 9F 9B 98 99 17 9F 99 D2 0E 36 21 33 A5 A7 2C B5 A8 18 A8 39 BE 24 33 37 B5 AF 2F 37 B1 28 3B B5 A4 20 27 31 39 15 28 D9 5B 5C 9A 94 9B 59 0C 96 CF 4C E9 9B 9C 0C 36 B2 08 C9 BE E2 77 9B 1A BB EE AD EB 56 52 DF D5 D3 E5 64 60 69 66 EC E5 11 10 16 D4 AA C8 D4 9B DD C0 FF B4 75 4E C7 94 BE C3 4C 8B FA 94 BA B3 FB D5 98 98 8B 9F DE D1 9A 70 01 00 F6 9B E3 17 DA 00 00 00]|
我想在dataframe上使用map函数来读取这个列值并执行一些操作

d、 映射(relationshipMapFunction,编码器)

其中,在
relationshipmappfunction
中,我试图提取上面的blob

public class RelationshipMapFunction implements MapFunction<Row, String> {
    private static final long serialVersionUID = 6766320395808127072L;
    private static Logger LOG = Logger.getLogger(JobRunner.class);

    @Override
    public String call(Row row) throws Exception {
        // Code to read binary data and perform some actions

    }
}
公共类RelationshipMapFunction实现MapFunction{
私有静态最终长serialVersionUID=6766320395808127072L;
私有静态记录器LOG=Logger.getLogger(JobRunner.class);
@凌驾
公共字符串调用(行)引发异常{
//读取二进制数据并执行某些操作的代码
}
}

如何从call方法中的
变量中提取字节数组?

有几种方法,但哪种方法最好取决于您的要求。让我们看看其中的一些

数据帧映射函数 要遵循显示的代码,然后直接从Row类型的对象提取字节数组,可以使用
Row
类的方法:

public class RelationshipMapFunction implements MapFunction<Row, String> {
    @Override
    public String call(Row row) throws Exception {
        final byte[] blob = row.<byte[]>getAs(0); // 0 is the index of the blob column in the dataframe
        return transformBlob(blob);
    }
}
然后,在应用编码器后,您可以将其与数据帧一起使用:

Dataset<String> mappedDs = df
    .as(Encoders.BINARY())
    .map(new RelationshipMapFunction(), Encoders.STRING());
Dataset mappedDs=df
.as(Encoders.BINARY())
.map(新的RelationshipMapFunction(),Encoders.STRING());

在这两种情况下,如果map函数很简单,可以用lambda替换它

其他选项包括使用UDF


请注意,如果您需要应用的转换像从字节数组解码UTF8字符串一样简单,那么您可以直接使用Spark SQL函数(从二进制转换为字符串即可)。

使用
byte[]attrValBytes=(byte[])r.get(0)得到类似的解决方案
获取字节数组。谢谢
public class RelationshipMapFunction implements MapFunction<byte[], String> {
    @Override
    public String call(byte[] blob) {
        return transformBlob(blob);
    }
}
Dataset<String> mappedDs = df
    .as(Encoders.BINARY())
    .map(new RelationshipMapFunction(), Encoders.STRING());