Python 通过Spark maps共享HappyBase连接

Python 通过Spark maps共享HappyBase连接,python,mapreduce,hbase,apache-spark,happybase,Python,Mapreduce,Hbase,Apache Spark,Happybase,我正在使用Spark和HBase(使用HappyBase库),在使用小型数据集时一切正常。但是,在处理大型数据集时,在多次调用映射函数后,与HBase Thrift的连接就会丢失。目前我正在使用一个伪节点 具体而言,map函数发生以下错误: TTransportException: Could not connect to localhost:9090 def save_triples(triple, ac, table_name, ac_vertex_id, graph_table_name

我正在使用Spark和HBase(使用HappyBase库),在使用小型数据集时一切正常。但是,在处理大型数据集时,在多次调用映射函数后,与HBase Thrift的连接就会丢失。目前我正在使用一个伪节点

具体而言,map函数发生以下错误:

TTransportException: Could not connect to localhost:9090
def save_triples(triple, ac, table_name, ac_vertex_id, graph_table_name):
    connection = happybase.Connection(HBASE_SERVER_IP, compat='0.94')
    table = connection.table(table_name)
    [...]
    connection.close()
counts = lines.map(lambda x: save_triples(x, ac, table_name, ac_vertex_id, graph_table_name))
output = counts.collect()
地图功能:

TTransportException: Could not connect to localhost:9090
def save_triples(triple, ac, table_name, ac_vertex_id, graph_table_name):
    connection = happybase.Connection(HBASE_SERVER_IP, compat='0.94')
    table = connection.table(table_name)
    [...]
    connection.close()
counts = lines.map(lambda x: save_triples(x, ac, table_name, ac_vertex_id, graph_table_name))
output = counts.collect()
这是对映射函数的调用:

TTransportException: Could not connect to localhost:9090
def save_triples(triple, ac, table_name, ac_vertex_id, graph_table_name):
    connection = happybase.Connection(HBASE_SERVER_IP, compat='0.94')
    table = connection.table(table_name)
    [...]
    connection.close()
counts = lines.map(lambda x: save_triples(x, ac, table_name, ac_vertex_id, graph_table_name))
output = counts.collect()
我怀疑这是因为许多连接正在被打开。我曾尝试在main函数中创建“connection”对象,并将其作为参数传递给map函数(类似的内容适用于Java中的HBase库),但出现以下错误:

pickle.PicklingError: Can't pickle builtin <type 'method_descriptor'>
pickle.PicklingError:无法对内置进行pickle
任何帮助都将不胜感激。

我也遇到了同样的问题