pyspark将数据帧写入hbase,整数值以字节形式加载

pyspark将数据帧写入hbase,整数值以字节形式加载,pyspark,hbase,Pyspark,Hbase,在通过pyspark将数据帧写入hbase时,默认情况下,在将数据帧写入hbase时,我们是否可以选择仅将整数值转换为整数?在hbase表中,整数值转换为字节类型 Below is the code: catalog2 = { "table": {"namespace": "default","name": "trip_test1"}, "rowkey": "key1", "columns": { "serial_no":

在通过pyspark将数据帧写入hbase时,默认情况下,在将数据帧写入hbase时,我们是否可以选择仅将整数值转换为整数?在hbase表中,整数值转换为字节类型

Below is the code:
    catalog2 = {
        "table": {"namespace": "default","name": "trip_test1"},
        "rowkey": "key1",
        "columns": {
        "serial_no": {"cf": "rowkey","col": "key1","type": "string"},
        "payment_type": {"cf": "sales","col": "payment_type","type":"string"},
        "fare_amount": {"cf": "sales","col": "fare_amount","type": "string"},
        "surcharge": {"cf": "sales","col": "surcharge","type": "string"},
        "mta_tax": {"cf": "sales","col": "mta_tax","type": "string"},
        "tip_amount": {"cf": "sales","col": "tip_amount","type": "string"},
        "tolls_amount": {"cf": "sales","col": "tolls_amount","type":"string"},
        "total_amount": {"cf": "sales","col": "total_amount","type": "string"}
    }
}

import json

cat2=json.dumps(catalog2)

df.write.option("catalog",cat2).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()
输出:

\x00\x00\x03\xE7 column=sales:payment_type, timestamp=1529495930994, value=CSH
\x00\x00\x03\xE7 column=sales:surcharge, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tip_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tolls_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:total_amount, timestamp=1529495930994, value=@!\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:fare_amount, timestamp=1529495930994, value=@\x18\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:mta_tax, timestamp=1529495930994, value=?\xE0\x00\x00\x00\x00\x00\x00
预期产出:

999 column=sales:fare_amount, timestamp=1529392479358, value=8.0
999 column=sales:mta_tax, timestamp=1529392479358, value=0.5
999 column=sales:payment_type, timestamp=1529392479358, value=CSH
999 column=sales:surcharge, timestamp=1529392479358, value=0.0
999 column=sales:tip_amount, timestamp=1529392479358, value=0.0
999 column=sales:tolls_amount, timestamp=1529392479358, value=0.0
999 column=sales:total_amount, timestamp=1529392479358, value=8.5

数值将转换为字节,然后存储在Hbase中。在从hbase读取数据时,必须使用相同的库(在您的示例中为“org.apache.spark.sql.execution.datasources.hbase”)来获取准确的值

如果要在Hbase中将值存储为数字,请将列的数据类型转换为字符串类型,并将其存储为库“org.apache.spark.sql.execution.datasources.Hbase”不会将字符串转换为字节


确保列值和目录类型的数据类型相同,以获得更好的结果。

您的问题不清楚。HBase只存储字节数组-应用程序负责在读取时转换数据。您发布的输出似乎来自hbase shell,它显然只显示字节数组。在将dataframe写入hbase表时,整数值将转换为字节,是否有其他方法通过修改上面给出的df.write命令,仅在hbase表中的整数值中获取整数值。