Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 在ApacheSpark中查询ApacheDrill表单_Apache Spark_Pyspark_Apache Drill - Fatal编程技术网

Apache spark 在ApacheSpark中查询ApacheDrill表单

Apache spark 在ApacheSpark中查询ApacheDrill表单,apache-spark,pyspark,apache-drill,Apache Spark,Pyspark,Apache Drill,我从ApacheSpark内部查询ApacheDrill。我的问题是,如何从spark向drill发送sql命令,而不是select*from。默认情况下,spark在select*from内发送查询。另外,当我查询dfs以外的模式时,我得到的是NullPointerException。请帮忙 我的spark版本是2.2.0 这是我的密码: 1.schema=dfs: dataframe_mysql = spark.read.format("jdbc").option("url", "jdbc:

我从ApacheSpark内部查询ApacheDrill。我的问题是,如何从spark向drill发送sql命令,而不是
select*from
。默认情况下,spark在select*from内发送查询。另外,当我查询dfs以外的模式时,我得到的是NullPointerException。请帮忙

我的spark版本是2.2.0

这是我的密码: 1.schema=dfs:

dataframe_mysql = spark.read.format("jdbc").option("url", "jdbc:drill:zk=%s;schema=%s;" % (foreman,schema)).option("driver","org.apache.drill.jdbc.Driver").option("dbtable","\"/user/titanic_data/test.csv\"").load()
  • Schema=MySQL

    dataframe\u mysql=spark.read.format(“jdbc”).option(“url”,“jdbc:drill:zk=%s;schema=mysql;”%(foreman)).option(“driver”,“org.apache.drill.jdbc.driver”).option(“dbtable”,“mysql.\“spark3\”).load()

  • 这是完全错误:

    Name: org.apache.toree.interpreter.broker.BrokerException
    Message: Py4JJavaError: An error occurred while calling o40.load.
    : java.sql.SQLException: Failed to create prepared statement: SYSTEM ERROR: NullPointerException
    
    
    [Error Id: d1e4b310-f4df-4e7c-90ae-983cc5c89f94 on inpunpclx1825e.kih.kmart.com:31010]
        at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newServerPreparedStatement(DrillJdbc41Factory.java:147)
        at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newPreparedStatement(DrillJdbc41Factory.java:108)
        at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newPreparedStatement(DrillJdbc41Factory.java:50)
        at oadd.org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:278)
        at org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareStatement(DrillConnectionImpl.java:389)
        at oadd.org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:119)
        at org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareStatement(DrillConnectionImpl.java:422)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:113)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
    
    (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o40.load.\n', JavaObject id=o41), <traceback object at 0x7f00106d6488>)
    StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
    org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
    scala.Option.foreach(Option.scala:257)
    org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:162)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:498)
    py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    py4j.Gateway.invoke(Gateway.java:280)
    py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    py4j.commands.CallCommand.execute(CallCommand.java:79)
    py4j.GatewayConnection.run(GatewayConnection.java:214)
    java.lang.Thread.run(Thread.java:748)
    
    Name:org.apache.toree.解释器.broker.BrokerException
    消息:Py4JJavaError:调用o40.load时出错。
    :java.sql.SQLException:未能创建准备好的语句:系统错误:NullPointerException
    [inpupclx1825e.kih.kmart.com:31010上的错误Id:d1e4b310-f4df-4e7c-90ae-983cc5c89f94]
    位于org.apache.drill.jdbc.impl.DrillJdbc41Factory.newServerPreparedStatement(DrillJdbc41Factory.java:147)
    位于org.apache.drill.jdbc.impl.DrillJdbc41Factory.newPreparedStatement(DrillJdbc41Factory.java:108)
    位于org.apache.drill.jdbc.impl.DrillJdbc41Factory.newPreparedStatement(DrillJdbc41Factory.java:50)
    在oadd.org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:278)
    位于org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareStatement(DrillConnectionImpl.java:389)
    在oadd.org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:119)
    位于org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareStatement(DrillConnectionImpl.java:422)
    位于org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
    位于org.apache.spark.sql.execution.datasources.jdbc.jdbcreation.(jdbcreation.scala:113)
    位于org.apache.spark.sql.execution.datasources.jdbc.jdbrelationprovider.createRelation(jdbrelationprovider.scala:47)
    位于org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
    位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
    位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
    位于java.lang.reflect.Method.invoke(Method.java:498)
    位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    在py4j.Gateway.invoke处(Gateway.java:280)
    位于py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    在py4j.commands.CallCommand.execute(CallCommand.java:79)
    在py4j.GatewayConnection.run处(GatewayConnection.java:214)
    运行(Thread.java:748)
    (,Py4JJavaError('调用o40.load时出错。\n',JavaObject id=o41),)
    StackTrace:org.apache.toree.解释器.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
    org.apache.toree.explorer.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
    scala.Option.foreach(Option.scala:257)
    org.apache.toree.explorer.broker.BrokerState.markFailure(BrokerState.scala:162)
    sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)
    invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:498)
    py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    py4j.Gateway.invoke(Gateway.java:280)
    py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    py4j.commands.CallCommand.execute(CallCommand.java:79)
    py4j.GatewayConnection.run(GatewayConnection.java:214)
    run(Thread.java:748)
    

    我已将默认的钻孔引号从``更改为“”,这样spark和钻孔之间就不会有任何引用标识符问题。

    确切的错误是什么?堆栈跟踪可能有助于解决您的问题。错误发生在什么地方?完整的堆栈更好。关于导致NPE的原因,我不太清楚。请检查错误。非常感谢。这对于hdfs文件来说很好。当我为MySQL中的“spark3”表执行它时,它失败了,出现了上述错误。