Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用PySpark读取HBase_Python_Apache Spark_Pyspark_Hbase - Fatal编程技术网

Python 使用PySpark读取HBase

Python 使用PySpark读取HBase,python,apache-spark,pyspark,hbase,Python,Apache Spark,Pyspark,Hbase,我正在尝试使用pyspark从HBase进行写/读操作 环境: from pyspark import SparkConf, SQLContext from pyspark.sql import SparkSession from datetime import datetime import json conf = (SparkConf() .setAppName("RW_from_HBase")) spark = SparkSession.builder \ .a

我正在尝试使用pyspark从HBase进行写/读操作

环境:

from pyspark import SparkConf, SQLContext
from pyspark.sql import SparkSession
from datetime import datetime
import json

conf = (SparkConf()
       .setAppName("RW_from_HBase"))

spark = SparkSession.builder \
     .appName(" ") \
     .config(conf=conf) \
     .getOrCreate()

sc = spark.sparkContext
sqlc = SQLContext(sc)

data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'

catalog = json.dumps(
    {
        "table":{"namespace":"spark", "name":"test_table"},
        "rowkey":"id",
        "columns":{
            "id":{"cf":"rowkey", "col":"id", "type":"string"},
            "filename":{"cf":"content", "col":"filename", "type":"string"},
            "created_ts":{"cf":"content", "col":"created_ts", "type":"string"},
            "html":{"cf":"content", "col":"html", "type":"string"}
        }
    })

# Writing into HBase
mydf.write\
    .options(catalog=catalog, newtable = 5)\
    .format(data_source_format)\
    .save()

# Reading from Hbase
df = sqlc.read\
    .options(catalog=catalog)\
    .format(data_source_format)\
    .load()

df.show()
--master local[*] --packages com.databricks:spark-avro_2.11:4.0.0,com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/repositories/releases/ --queue PyCharmSpark pyspark-shell
  • CDH 5.13
  • Hbase 1.2.0
  • Spark 2.3(安装为percel)
  • Python 3.6
  • 皮查姆
我正在使用HBase Spark Connector Project Core»1.1.1-2.1-s_2.11

我的代码是:

from pyspark import SparkConf, SQLContext
from pyspark.sql import SparkSession
from datetime import datetime
import json

conf = (SparkConf()
       .setAppName("RW_from_HBase"))

spark = SparkSession.builder \
     .appName(" ") \
     .config(conf=conf) \
     .getOrCreate()

sc = spark.sparkContext
sqlc = SQLContext(sc)

data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'

catalog = json.dumps(
    {
        "table":{"namespace":"spark", "name":"test_table"},
        "rowkey":"id",
        "columns":{
            "id":{"cf":"rowkey", "col":"id", "type":"string"},
            "filename":{"cf":"content", "col":"filename", "type":"string"},
            "created_ts":{"cf":"content", "col":"created_ts", "type":"string"},
            "html":{"cf":"content", "col":"html", "type":"string"}
        }
    })

# Writing into HBase
mydf.write\
    .options(catalog=catalog, newtable = 5)\
    .format(data_source_format)\
    .save()

# Reading from Hbase
df = sqlc.read\
    .options(catalog=catalog)\
    .format(data_source_format)\
    .load()

df.show()
--master local[*] --packages com.databricks:spark-avro_2.11:4.0.0,com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/repositories/releases/ --queue PyCharmSpark pyspark-shell
我的spark提交是:

from pyspark import SparkConf, SQLContext
from pyspark.sql import SparkSession
from datetime import datetime
import json

conf = (SparkConf()
       .setAppName("RW_from_HBase"))

spark = SparkSession.builder \
     .appName(" ") \
     .config(conf=conf) \
     .getOrCreate()

sc = spark.sparkContext
sqlc = SQLContext(sc)

data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'

catalog = json.dumps(
    {
        "table":{"namespace":"spark", "name":"test_table"},
        "rowkey":"id",
        "columns":{
            "id":{"cf":"rowkey", "col":"id", "type":"string"},
            "filename":{"cf":"content", "col":"filename", "type":"string"},
            "created_ts":{"cf":"content", "col":"created_ts", "type":"string"},
            "html":{"cf":"content", "col":"html", "type":"string"}
        }
    })

# Writing into HBase
mydf.write\
    .options(catalog=catalog, newtable = 5)\
    .format(data_source_format)\
    .save()

# Reading from Hbase
df = sqlc.read\
    .options(catalog=catalog)\
    .format(data_source_format)\
    .load()

df.show()
--master local[*] --packages com.databricks:spark-avro_2.11:4.0.0,com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/repositories/releases/ --queue PyCharmSpark pyspark-shell
当我写入HBase时,一切正常,mydf中的数据保存到HBase表中

当我尝试阅读时,它很好地工作,直到激发行动。 df.show()-导致错误

WARNING: Running spark-class from user-defined location.
http://repo.hortonworks.com/content/repositories/releases/ added as a remote repository with the name: repo-1
Ivy Default Cache set to: /home/cloudera/.ivy2/cache
The jars for the packages stored in: /home/cloudera/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.11 added as a dependency
com.hortonworks#shc-core added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found com.databricks#spark-avro_2.11;4.0.0 in central
    found org.slf4j#slf4j-api;1.7.5 in central
    found org.apache.avro#avro;1.7.6 in central
    found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
    found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
    found com.thoughtworks.paranamer#paranamer;2.3 in central
    found org.xerial.snappy#snappy-java;1.0.5 in central
    found org.apache.commons#commons-compress;1.4.1 in central
    found org.tukaani#xz;1.0 in central
    found com.hortonworks#shc-core;1.1.1-2.1-s_2.11 in repo-1
    found org.apache.hbase#hbase-server;1.1.2 in central
    found org.apache.hbase#hbase-protocol;1.1.2 in central
    found org.apache.hbase#hbase-annotations;1.1.2 in central
    found com.github.stephenc.findbugs#findbugs-annotations;1.3.9-1 in central
    found log4j#log4j;1.2.17 in central
    found junit#junit;4.11 in central
    found org.hamcrest#hamcrest-core;1.3 in central
    found com.google.protobuf#protobuf-java;2.5.0 in central
    found org.apache.hbase#hbase-procedure;1.1.2 in central
    found com.google.guava#guava;12.0.1 in central
    found com.google.code.findbugs#jsr305;1.3.9 in central
    found org.apache.hbase#hbase-client;1.1.2 in central
    found commons-codec#commons-codec;1.9 in central
    found commons-io#commons-io;2.4 in central
    found commons-lang#commons-lang;2.6 in central
    found io.netty#netty-all;4.0.23.Final in central
    found org.apache.zookeeper#zookeeper;3.4.6 in central
    found org.slf4j#slf4j-api;1.7.7 in central
    found org.slf4j#slf4j-log4j12;1.6.1 in central
    found org.apache.htrace#htrace-core;3.1.0-incubating in central
    found org.jruby.jcodings#jcodings;1.0.8 in central
    found org.jruby.joni#joni;2.1.2 in central
    found commons-httpclient#commons-httpclient;3.1 in central
    found commons-collections#commons-collections;3.2.1 in central
    found com.yammer.metrics#metrics-core;2.2.0 in central
    found com.sun.jersey#jersey-core;1.9 in central
    found com.sun.jersey#jersey-server;1.9 in central
    found commons-cli#commons-cli;1.2 in central
    found org.apache.commons#commons-math;2.2 in central
    found org.mortbay.jetty#jetty;6.1.26 in central
    found org.mortbay.jetty#jetty-util;6.1.26 in central
    found org.mortbay.jetty#jetty-sslengine;6.1.26 in central
    found org.mortbay.jetty#jsp-2.1;6.1.14 in central
    found org.mortbay.jetty#jsp-api-2.1;6.1.14 in central
    found org.mortbay.jetty#servlet-api-2.5;6.1.14 in central
    found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central
    found tomcat#jasper-compiler;5.5.23 in central
    found org.jamon#jamon-runtime;2.3.1 in central
    found com.lmax#disruptor;3.3.0 in central
    found org.apache.hbase#hbase-prefix-tree;1.1.2 in central
    found org.mortbay.jetty#servlet-api;2.5-20081211 in central
    found tomcat#jasper-runtime;5.5.23 in central
    found commons-el#commons-el;1.0 in central
    found org.apache.hbase#hbase-common;1.1.2 in central
    found org.apache.phoenix#phoenix-core;4.9.0-HBase-1.1 in central
    found org.apache.tephra#tephra-api;0.9.0-incubating in central
    found org.apache.tephra#tephra-hbase-compat-1.1;0.9.0-incubating in central
    found org.apache.tephra#tephra-core;0.9.0-incubating in central
    found com.google.code.gson#gson;2.2.4 in central
    found com.google.guava#guava;13.0.1 in central
    found com.google.inject#guice;3.0 in central
    found javax.inject#javax.inject;1 in central
    found aopalliance#aopalliance;1.0 in central
    found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central
    found asm#asm;3.1 in central
    found com.google.inject.extensions#guice-assistedinject;3.0 in central
    found ch.qos.logback#logback-classic;1.0.9 in central
    found ch.qos.logback#logback-core;1.0.9 in central
    found org.apache.thrift#libthrift;0.9.0 in central
    found org.apache.httpcomponents#httpcore;4.1.3 in central
    found it.unimi.dsi#fastutil;6.5.6 in central
    found org.apache.twill#twill-common;0.6.0-incubating in central
    found com.google.code.findbugs#jsr305;2.0.1 in central
    found org.apache.twill#twill-core;0.6.0-incubating in central
    found org.apache.twill#twill-api;0.6.0-incubating in central
    found org.apache.twill#twill-discovery-api;0.6.0-incubating in central
    found org.apache.twill#twill-zookeeper;0.6.0-incubating in central
    found org.apache.twill#twill-discovery-core;0.6.0-incubating in central
    found org.ow2.asm#asm-all;5.0.2 in central
    found io.dropwizard.metrics#metrics-core;3.1.0 in central
    found org.antlr#antlr-runtime;3.5.2 in central
    found jline#jline;2.11 in central
    found sqlline#sqlline;1.2.0 in central
    found joda-time#joda-time;1.6 in central
    found com.github.stephenc.jcip#jcip-annotations;1.0-1 in central
    found junit#junit;4.12 in central
    found org.apache.httpcomponents#httpclient;4.0.1 in central
    found commons-logging#commons-logging;1.2 in central
    found org.iq80.snappy#snappy;0.3 in central
    found commons-collections#commons-collections;3.2.2 in central
    found org.apache.commons#commons-csv;1.0 in central
    found org.apache.hbase#hbase-annotations;1.1.3 in central
    found org.apache.hbase#hbase-protocol;1.1.3 in central
    found org.apache.hadoop#hadoop-common;2.7.1 in central
    found org.apache.hadoop#hadoop-annotations;2.7.1 in central
    found org.apache.commons#commons-math3;3.1.1 in central
    found xmlenc#xmlenc;0.52 in central
    found commons-net#commons-net;3.1 in central
    found javax.servlet#servlet-api;2.5 in central
    found com.sun.jersey#jersey-json;1.9 in central
    found org.codehaus.jettison#jettison;1.1 in central
    found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
    found javax.xml.bind#jaxb-api;2.2.2 in central
    found javax.xml.stream#stax-api;1.0-2 in central
    found javax.activation#activation;1.1 in central
    found org.codehaus.jackson#jackson-xc;1.9.2 in central
    found net.java.dev.jets3t#jets3t;0.9.0 in central
    found org.apache.httpcomponents#httpcore;4.2.5 in central
    found com.jamesmurty.utils#java-xmlbuilder;0.4 in central
    found commons-configuration#commons-configuration;1.6 in central
    found commons-digester#commons-digester;1.8 in central
    found commons-beanutils#commons-beanutils;1.7.0 in central
    found commons-beanutils#commons-beanutils-core;1.8.0 in central
    found org.apache.hadoop#hadoop-auth;2.7.1 in central
    found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in central
    found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in central
    found org.apache.directory.api#api-asn1-api;1.0.0-M20 in central
    found org.apache.directory.api#api-util;1.0.0-M20 in central
    found org.apache.curator#curator-framework;2.7.1 in central
    found org.apache.curator#curator-client;2.7.1 in central
    found com.jcraft#jsch;0.1.42 in central
    found org.apache.curator#curator-recipes;2.7.1 in central
    found org.apache.hadoop#hadoop-mapreduce-client-core;2.7.1 in central
    found org.apache.hadoop#hadoop-yarn-common;2.7.1 in central
    found org.apache.hadoop#hadoop-yarn-api;2.7.1 in central
    found com.sun.jersey#jersey-client;1.9 in central
    found com.google.inject.extensions#guice-servlet;3.0 in central
    found com.sun.jersey.contribs#jersey-guice;1.9 in central
    found org.slf4j#slf4j-log4j12;1.7.10 in central
    found io.netty#netty;3.6.2.Final in central
    found javax.servlet.jsp#jsp-api;2.1 in central
:: resolution report :: resolve 27998ms :: artifacts dl 2975ms
    :: modules in use:
    aopalliance#aopalliance;1.0 from central in [default]
    asm#asm;3.1 from central in [default]
    ch.qos.logback#logback-classic;1.0.9 from central in [default]
    ch.qos.logback#logback-core;1.0.9 from central in [default]
    com.databricks#spark-avro_2.11;4.0.0 from central in [default]
    com.github.stephenc.findbugs#findbugs-annotations;1.3.9-1 from central in [default]
    com.github.stephenc.jcip#jcip-annotations;1.0-1 from central in [default]
    com.google.code.findbugs#jsr305;2.0.1 from central in [default]
    com.google.code.gson#gson;2.2.4 from central in [default]
    com.google.guava#guava;13.0.1 from central in [default]
    com.google.inject#guice;3.0 from central in [default]
    com.google.inject.extensions#guice-assistedinject;3.0 from central in [default]
    com.google.inject.extensions#guice-servlet;3.0 from central in [default]
    com.google.protobuf#protobuf-java;2.5.0 from central in [default]
    com.hortonworks#shc-core;1.1.1-2.1-s_2.11 from repo-1 in [default]
    com.jamesmurty.utils#java-xmlbuilder;0.4 from central in [default]
    com.jcraft#jsch;0.1.42 from central in [default]
    com.lmax#disruptor;3.3.0 from central in [default]
    com.sun.jersey#jersey-client;1.9 from central in [default]
    com.sun.jersey#jersey-core;1.9 from central in [default]
    com.sun.jersey#jersey-json;1.9 from central in [default]
    com.sun.jersey#jersey-server;1.9 from central in [default]
    com.sun.jersey.contribs#jersey-guice;1.9 from central in [default]
    com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
    com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
    com.yammer.metrics#metrics-core;2.2.0 from central in [default]
    commons-beanutils#commons-beanutils;1.7.0 from central in [default]
    commons-beanutils#commons-beanutils-core;1.8.0 from central in [default]
    commons-cli#commons-cli;1.2 from central in [default]
    commons-codec#commons-codec;1.9 from central in [default]
    commons-collections#commons-collections;3.2.2 from central in [default]
    commons-configuration#commons-configuration;1.6 from central in [default]
    commons-digester#commons-digester;1.8 from central in [default]
    commons-el#commons-el;1.0 from central in [default]
    commons-httpclient#commons-httpclient;3.1 from central in [default]
    commons-io#commons-io;2.4 from central in [default]
    commons-lang#commons-lang;2.6 from central in [default]
    commons-logging#commons-logging;1.2 from central in [default]
    commons-net#commons-net;3.1 from central in [default]
    io.dropwizard.metrics#metrics-core;3.1.0 from central in [default]
    io.netty#netty;3.6.2.Final from central in [default]
    io.netty#netty-all;4.0.23.Final from central in [default]
    it.unimi.dsi#fastutil;6.5.6 from central in [default]
    javax.activation#activation;1.1 from central in [default]
    javax.inject#javax.inject;1 from central in [default]
    javax.servlet#servlet-api;2.5 from central in [default]
    javax.servlet.jsp#jsp-api;2.1 from central in [default]
    javax.xml.bind#jaxb-api;2.2.2 from central in [default]
    javax.xml.stream#stax-api;1.0-2 from central in [default]
    jline#jline;2.11 from central in [default]
    joda-time#joda-time;1.6 from central in [default]
    junit#junit;4.12 from central in [default]
    log4j#log4j;1.2.17 from central in [default]
    net.java.dev.jets3t#jets3t;0.9.0 from central in [default]
    org.antlr#antlr-runtime;3.5.2 from central in [default]
    org.apache.avro#avro;1.7.6 from central in [default]
    org.apache.commons#commons-compress;1.4.1 from central in [default]
    org.apache.commons#commons-csv;1.0 from central in [default]
    org.apache.commons#commons-math;2.2 from central in [default]
    org.apache.commons#commons-math3;3.1.1 from central in [default]
    org.apache.curator#curator-client;2.7.1 from central in [default]
    org.apache.curator#curator-framework;2.7.1 from central in [default]
    org.apache.curator#curator-recipes;2.7.1 from central in [default]
    org.apache.directory.api#api-asn1-api;1.0.0-M20 from central in [default]
    org.apache.directory.api#api-util;1.0.0-M20 from central in [default]
    org.apache.directory.server#apacheds-i18n;2.0.0-M15 from central in [default]
    org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from central in [default]
    org.apache.hadoop#hadoop-annotations;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-auth;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-common;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-mapreduce-client-core;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-api;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-common;2.7.1 from central in [default]
    org.apache.hbase#hbase-annotations;1.1.3 from central in [default]
    org.apache.hbase#hbase-client;1.1.2 from central in [default]
    org.apache.hbase#hbase-common;1.1.2 from central in [default]
    org.apache.hbase#hbase-prefix-tree;1.1.2 from central in [default]
    org.apache.hbase#hbase-procedure;1.1.2 from central in [default]
    org.apache.hbase#hbase-protocol;1.1.3 from central in [default]
    org.apache.hbase#hbase-server;1.1.2 from central in [default]
    org.apache.htrace#htrace-core;3.1.0-incubating from central in [default]
    org.apache.httpcomponents#httpclient;4.0.1 from central in [default]
    org.apache.httpcomponents#httpcore;4.2.5 from central in [default]
    org.apache.phoenix#phoenix-core;4.9.0-HBase-1.1 from central in [default]
    org.apache.tephra#tephra-api;0.9.0-incubating from central in [default]
    org.apache.tephra#tephra-core;0.9.0-incubating from central in [default]
    org.apache.tephra#tephra-hbase-compat-1.1;0.9.0-incubating from central in [default]
    org.apache.thrift#libthrift;0.9.0 from central in [default]
    org.apache.twill#twill-api;0.6.0-incubating from central in [default]
    org.apache.twill#twill-common;0.6.0-incubating from central in [default]
    org.apache.twill#twill-core;0.6.0-incubating from central in [default]
    org.apache.twill#twill-discovery-api;0.6.0-incubating from central in [default]
    org.apache.twill#twill-discovery-core;0.6.0-incubating from central in [default]
    org.apache.twill#twill-zookeeper;0.6.0-incubating from central in [default]
    org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-xc;1.9.2 from central in [default]
    org.codehaus.jettison#jettison;1.1 from central in [default]
    org.hamcrest#hamcrest-core;1.3 from central in [default]
    org.iq80.snappy#snappy;0.3 from central in [default]
    org.jamon#jamon-runtime;2.3.1 from central in [default]
    org.jruby.jcodings#jcodings;1.0.8 from central in [default]
    org.jruby.joni#joni;2.1.2 from central in [default]
    org.mortbay.jetty#jetty;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-sslengine;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
    org.mortbay.jetty#jsp-2.1;6.1.14 from central in [default]
    org.mortbay.jetty#jsp-api-2.1;6.1.14 from central in [default]
    org.mortbay.jetty#servlet-api;2.5-20081211 from central in [default]
    org.mortbay.jetty#servlet-api-2.5;6.1.14 from central in [default]
    org.ow2.asm#asm-all;5.0.2 from central in [default]
    org.slf4j#slf4j-api;1.7.7 from central in [default]
    org.slf4j#slf4j-log4j12;1.7.10 from central in [default]
    org.sonatype.sisu.inject#cglib;2.2.1-v20090111 from central in [default]
    org.tukaani#xz;1.0 from central in [default]
    org.xerial.snappy#snappy-java;1.0.5 from central in [default]
    sqlline#sqlline;1.2.0 from central in [default]
    tomcat#jasper-compiler;5.5.23 from central in [default]
    tomcat#jasper-runtime;5.5.23 from central in [default]
    xmlenc#xmlenc;0.52 from central in [default]
    :: evicted modules:
    org.slf4j#slf4j-api;1.7.5 by [org.slf4j#slf4j-api;1.7.7] in [default]
    org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.7] in [default]
    org.apache.hbase#hbase-protocol;1.1.2 by [org.apache.hbase#hbase-protocol;1.1.3] in [default]
    org.apache.hbase#hbase-annotations;1.1.2 by [org.apache.hbase#hbase-annotations;1.1.3] in [default]
    junit#junit;4.11 by [junit#junit;4.12] in [default]
    com.google.guava#guava;12.0.1 by [com.google.guava#guava;13.0.1] in [default]
    com.google.code.findbugs#jsr305;1.3.9 by [com.google.code.findbugs#jsr305;2.0.1] in [default]
    org.slf4j#slf4j-log4j12;1.6.1 by [org.slf4j#slf4j-log4j12;1.7.10] in [default]
    commons-collections#commons-collections;3.2.1 by [commons-collections#commons-collections;3.2.2] in [default]
    commons-lang#commons-lang;2.5 by [commons-lang#commons-lang;2.6] in [default]
    org.apache.httpcomponents#httpclient;4.1.3 by [org.apache.httpcomponents#httpclient;4.0.1] in [default]
    org.apache.httpcomponents#httpcore;4.1.3 by [org.apache.httpcomponents#httpcore;4.2.5] in [default]
    org.apache.zookeeper#zookeeper;3.4.5 by [org.apache.zookeeper#zookeeper;3.4.6] in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.2 by [org.codehaus.jackson#jackson-core-asl;1.9.13] in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.2 by [org.codehaus.jackson#jackson-mapper-asl;1.9.13] in [default]
    org.apache.httpcomponents#httpcore;4.0.1 by [org.apache.httpcomponents#httpcore;4.2.5] in [default]
    commons-codec#commons-codec;1.7 by [commons-codec#commons-codec;1.9] in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.2 by [org.codehaus.jackson#jackson-jaxrs;1.9.13] in [default]
    org.apache.httpcomponents#httpclient;4.2.5 by [org.apache.httpcomponents#httpclient;4.0.1] in [default]
    org.apache.avro#avro;1.7.4 by [org.apache.avro#avro;1.7.6] in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |  142  |   9   |   9   |   20  ||  122  |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 122 already retrieved (0kB/387ms)
18/07/12 03:02:08 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 192.168.116.128 instead (on interface eth1)
18/07/12 03:02:08 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 0:>                                                          (0 + 1) / 1]18/07/12 03:04:37 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)Lorg/apache/hadoop/hbase/client/Scan;
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.org$apache$spark$sql$execution$datasources$hbase$HBaseTableScanRDD$$buildScan(HBaseTableScan.scala:223)
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$8.apply(HBaseTableScan.scala:280)
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$8.apply(HBaseTableScan.scala:279)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.compute(HBaseTableScan.scala:279)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
据我所知,问题在于,Hortonworks使用HBase 1.1.2版本依赖性构建shc核心,但我使用的是HBase 1.2.0。可能是JAR中没有一些类,这些类是从central Maven repo为hbase 1.1.2加载的。请纠正我,不确定这个错误的根本原因

我找到了这个错误的解释:

java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)Lorg/apache/hadoop/hbase/client/Scan
在这里:

我可以不在本地构建源代码来解决这个问题吗?一些人回答说重建并不能解决这个问题。或者PySpark是否有其他关于HBase的解释


请告知为什么从HBase读取存在问题。如何避免此问题?

此问题通常是由于安装的版本与项目中使用的版本不同,或者依赖项来源不同而导致的。请检查项目中hbase的版本