Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用PySpark从MariaDB读取查询_Apache Spark_Hadoop_Jdbc_Pyspark_Mariadb - Fatal编程技术网

Apache spark 使用PySpark从MariaDB读取查询

Apache spark 使用PySpark从MariaDB读取查询,apache-spark,hadoop,jdbc,pyspark,mariadb,Apache Spark,Hadoop,Jdbc,Pyspark,Mariadb,我正在尝试读取从MariaDB到pyspark dataframe的查询结果。 我用过的罐子是 --jars mariadb-java-client-2.2.2.jar 我能用电脑阅读表格 df = spark.read.format("jdbc")\ .option("url","jdbc:mariadb://xxx.xxx.xx.xx:xxxx/hdpms")\ .option("driver", "org.mariadb.jdbc.Driver")\

我正在尝试读取从MariaDB到pyspark dataframe的查询结果。 我用过的罐子是

--jars mariadb-java-client-2.2.2.jar
我能用电脑阅读表格

df = spark.read.format("jdbc")\
        .option("url","jdbc:mariadb://xxx.xxx.xx.xx:xxxx/hdpms")\
        .option("driver", "org.mariadb.jdbc.Driver")\
        .option("dbtable", Mytable)\
        .option("user", "xxxxx_xxxxx")\
        .option("password", "xxxxx")\
        .load()
现在我正在寻找一个命令来运行一个简单的查询,比如

SELECT col1,col2,col3,.. From MyTable Where date>2019 and cond2;
尽管我可以使用将查询作为

"MyTable date>2019 and cond2 --"
当jar添加时,在开始处选择*FROM,在结尾处选择
其中1=0
但我面临以下错误

    py4j.protocol.Py4JJavaError: An error occurred while calling o455.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 12, xhadoopm3095p.aetna.com, executor 2): java.sql.SQLException: Value "DATE_CREATED" cannot be parse as Timestamp
        at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalTimestamp(TextRowProtocol.java:592)
        at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getTimestamp(SelectResultSet.java:1178)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$11.apply(JdbcUtils.scala:439)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$11.apply(JdbcUtils.scala:438)
谁能帮我一下吗。 多谢各位


使用query为表创建一个别名,它将起作用

这是否回答了您的问题?
df = spark.read.format("jdbc")\
        .option("url","jdbc:mariadb://xxx.xxx.xx.xx:xxxx/hdpms")\
        .option("driver", "org.mariadb.jdbc.Driver")\
        .option("dbtable", "(SELECT col1,col2,col3,.. From MyTable Where date>2019 and cond2) tmp")\
        .option("user", "xxxxx_xxxxx")\
        .option("password", "xxxxx")\
        .load()