spark python脚本未写入hbase
我正在尝试运行此脚本 导入系统 导入json 从pyspark导入SparkContext 从pyspark.streaming导入StreamingContext def保存记录(rdd): 主机='sparkmaster.example.com' 表=‘猫’ keyConv=“org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter” valueConv=“org.apache.spark.examples.pythonconverters.StringListTopOutConverter” conf={“hbase.zookeeper.quorum”:主机, “hbase.mapred.outputtable”:表格, “mapreduce.outputformat.class”:“org.apache.hadoop.hbase.mapreduce.TableOutputFormat”, “mapreduce.job.output.key.class”:“org.apache.hadoop.hbase.io.ImmutableBytesWritable”, “mapreduce.job.output.value.class”:“org.apache.hadoop.io.Writable”} datamap=rdd.map(lambda x:(str(json.loads(x)[“id”]),[str(json.loads(x)[“id”],“cfamily”,“cats_json”,x])) saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv) 如果名称=“\uuuuu main\uuuuuuuu”: 如果len(sys.argv)!=三: 打印(“用法:StreamCatsToHBase.py”) 出口(-1) sc=SparkContext(appName=“StreamCatsToHBase”) ssc=StreamingContext(sc,1) lines=ssc.socketTextStream(sys.argv[1],int(sys.argv[2])) 行。foreachRDD(SaveRecord) ssc.start()#开始计算 ssc.awaitTermination()#等待计算终止 我无法运行它。我尝试了三种不同的命令行选项,但都没有生成输出,也没有将数据写入hbase表 以下是我尝试过的命令行选项spark python脚本未写入hbase,python,apache-spark,hbase,Python,Apache Spark,Hbase,我正在尝试运行此脚本 导入系统 导入json 从pyspark导入SparkContext 从pyspark.streaming导入StreamingContext def保存记录(rdd): 主机='sparkmaster.example.com' 表=‘猫’ keyConv=“org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter” valueConv=“org.apache.spa
spark提交--jars/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar--jars/usr/local/hbase/lib/hbase-examples-1.1.2.jar sp_json.py localhost 2389>sp_json.log
spark提交——驱动程序类路径/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar sp_json.py localhost 2389>sp_json.log
spark提交——驱动程序类路径/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar——jars/usr/local/hbase/lib/hbase-examples-1.1.2.jar sp_json.py localhost 2389>sp_json.log
这是你的电话号码。太冗长了。这是Apache spark中调试困难的原因之一,因为它会吐出太多的信息。最终使用以下命令语法
spark submit--jars/usr/local/spark/lib/spark-examples-1.5.2-hadoop2.4.0.jar、/usr/local/hbase/lib/hbase-examples-1.1.2.jar sp_json.py localhost 2399>sp_json.log使其正常工作
import sys
import json
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
def SaveRecord(rdd):
host = 'sparkmaster.example.com'
table = 'cats'
keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"
conf = {"hbase.zookeeper.quorum": host,
"hbase.mapred.outputtable": table,
"mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",
"mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}
datamap = rdd.map(lambda x: (str(json.loads(x)["id"]),[str(json.loads(x)["id"]),"cfamily","cats_json",x]))
datamap.saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: StreamCatsToHBase.py <hostname> <port>")
exit(-1)
sc = SparkContext(appName="StreamCatsToHBase")
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
lines.foreachRDD(SaveRecord)
ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate