Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 尝试读取spark中的athena表时出错_Apache Spark_Pyspark_Amazon Emr_Amazon Sagemaker - Fatal编程技术网

Apache spark 尝试读取spark中的athena表时出错

Apache spark 尝试读取spark中的athena表时出错,apache-spark,pyspark,amazon-emr,amazon-sagemaker,Apache Spark,Pyspark,Amazon Emr,Amazon Sagemaker,我在pyspark中有以下代码片段: import pandas as pd from pyspark import SparkContext, SparkConf from pyspark.context import SparkContext from pyspark.sql import Row, SQLContext, SparkSession import pyspark.sql.dataframe def validate_data(): conf = SparkConf(

我在pyspark中有以下代码片段:

import pandas as pd
from pyspark import SparkContext, SparkConf
from pyspark.context import SparkContext
from pyspark.sql import Row, SQLContext, SparkSession
import pyspark.sql.dataframe

def validate_data():
    conf = SparkConf().setAppName("app")
    spark = SparkContext(conf=conf)
    config = {
    "val_path" : "s3://forecasting/data/validation.csv"
    }

    data1_df = spark.read.table("db1.data_dest”)
    data2_df = spark.read.table("db2.data_source”)
    print(data1_df.count())
    print(data2_df.count())


if __name__ == "__main__":
    validate_data()
现在,当在sagemaker上的jupyter笔记本上运行(连接到EMR)时,此代码工作正常

但当我们在终端上作为python脚本运行时,它抛出了这个错误

错误消息

AttributeError: 'SparkContext' object has no attribute 'read'

我们必须自动化这些笔记本,因此我们正在尝试将它们转换为python脚本

您只能在Spark会话中调用
read
,而不能在Spark上下文中调用

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

conf = SparkConf().setAppName("app")
spark = SparkSession.builder.config(conf=conf)
也可以将Spark上下文转换为Spark会话

conf = SparkConf().setAppName("app")
sc = SparkContext(conf=conf)
spark = SparkSession(sc)