Apache spark 属性错误:';StructField';对象没有属性'_获取对象id';:使用自定义模式加载拼花地板文件
我正在尝试使用PySpark使用自定义模式读取拼花地板文件组,但它给出了AttributeError:“StructField”对象没有属性“\u get\u object\u id”错误 以下是我的示例代码:Apache spark 属性错误:';StructField';对象没有属性'_获取对象id';:使用自定义模式加载拼花地板文件,apache-spark,pyspark,apache-spark-sql,pyspark-sql,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Sql,我正在尝试使用PySpark使用自定义模式读取拼花地板文件组,但它给出了AttributeError:“StructField”对象没有属性“\u get\u object\u id”错误 以下是我的示例代码: import pyspark from pyspark.sql import SQLContext, SparkSession from pyspark.sql import Row import pyspark.sql.functions as func from pyspark.sq
import pyspark
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql import Row
import pyspark.sql.functions as func
from pyspark.sql.types import *
sc = pyspark.SparkContext()
spark = SparkSession(sc)
sqlContext = SQLContext(sc)
l = [('1',31200,'Execute',140,'ABC'),('2',31201,'Execute',140,'ABC'),('3',31202,'Execute',142,'ABC'),
('4',31103,'Execute',149,'DEF'),('5',31204,'Execute',145,'DEF'),('6',31205,'Execute',149,'DEF')]
rdd = sc.parallelize(l)
trades = rdd.map(lambda x: Row(global_order_id=int(x[0]), nanos=int(x[1]),message_type=x[2], price=int(x[3]),symbol=x[4]))
trades_df = sqlContext.createDataFrame(trades)
trades_df.printSchema()
trades_df.write.parquet('trades_parquet')
trades_df_Parquet = sqlContext.read.parquet('trades_parquet')
trades_df_Parquet.printSchema()
# The schema is encoded in a string.
schemaString = "global_order_id message_type nanos price symbol"
fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]
schema = StructType(fields)
trades_df_Parquet_n = spark.read.format('parquet').load('trades_parquet',schema,inferSchema =False)
#trades_df_Parquet_n = spark.read.parquet('trades_parquet',schema)
trades_df_Parquet_n.printSchema()
任何人都可以帮我提出建议。指定选项的名称
模式
,这样它就知道它不是格式
:
Signature:trades\u df\u Parquet\u n.load(路径=None,格式=None,模式=None,**选项)
你会得到:
trades\u df\u Parquet\u n=spark.read.format('Parquet').load('trades\u Parquet',schema=schema,inferSchema=False)