Apache spark Spark解码并解压缩gzip一个嵌入式base 64字符串_Apache Spark_Pyspark_Spark Dataframe

Apache spark Spark解码并解压缩gzip一个嵌入式base 64字符串

apache-spark pyspark

Apache spark Spark解码并解压缩gzip一个嵌入式base 64字符串,apache-spark,pyspark,spark-dataframe,Apache Spark,Pyspark,Spark Dataframe,我的Spark程序读取一个包含编码为64的gzip压缩字符串的文件。我必须解码和解压。我使用spark unbase64解码并生成字节数组 bytedf=df.withColumn("unbase",unbase64(col("value")) ) spark中是否有解压字节码的spark方法？使用base64的spark示例- import base64 . . #decode base 64 string using map operation or you may create udf

我的Spark程序读取一个包含编码为64的gzip压缩字符串的文件。我必须解码和解压。我使用spark unbase64解码并生成字节数组

bytedf=df.withColumn("unbase",unbase64(col("value")) )

spark中是否有解压字节码的spark方法？

使用base64的spark示例-

import base64
.
.
#decode base 64 string using map operation or you may create udf.
df.map(lambda base64string: base64.b64decode(base64string), <string encoder>)

导入base64
.
.
#使用映射操作解码Base64字符串，或者您可以创建udf。
map（lambda base64string:base64.b64解码（base64string），）

阅读详细的python示例。

我编写了一个udf

def decompress(ip):

    bytecode = base64.b64decode(x)
    d = zlib.decompressobj(32 + zlib.MAX_WBITS)
    decompressed_data = d.decompress(bytecode )
    return(decompressed_data.decode('utf-8'))



decompress = udf(decompress)
decompressedDF = df.withColumn("decompressed_XML",decompress("value"))

我有一个类似的案例，在我的案例中，我这样做：

from pyspark.sql.functions import col,unbase64,udf
from gzip import decompress

bytedf=df1.withColumn("unbase",unbase64(col("payload")))
decompress_func = lambda x: decompress(x).decode('utf-8')
udf_decompress = udf(decompress_func)
df2 = bytedf.withColumn('unbase_decompress', udf_decompress('unbase'))

字节数组中存储了哪些数据？是原始数据类型（string/int/long/double/…）还是自定义对象？bytedf有字节数组。