Python Pyspark错误:-数据类型<;类别';pyspark.sql.types.StringType'&燃气轮机;应该是<;类别';pyspark.sql.types.DataType'&燃气轮机;
我需要从pipelinedRDD中提取一些数据,但在将其转换为数据帧时,会出现以下错误:Python Pyspark错误:-数据类型<;类别';pyspark.sql.types.StringType'&燃气轮机;应该是<;类别';pyspark.sql.types.DataType'&燃气轮机;,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,我需要从pipelinedRDD中提取一些数据,但在将其转换为数据帧时,会出现以下错误: Traceback (most recent call last): File "/home/karan/Desktop/meds.py", line 42, in <module> relevantToSymEntered(newrdd) File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntere
Traceback (most recent call last):
File "/home/karan/Desktop/meds.py", line 42, in <module>
relevantToSymEntered(newrdd)
File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered
mat = spark.createDataFrame(self,StructType([StructField("Prescribed
medicine",StringType), StructField(["Disease","ID","Symptoms
Recorded","Severeness"],ArrayType)]))
File "/home/karan/Downloads/spark-2.4.2-bin-
hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__
"dataType %s should be an instance of %s" % (dataType, DataType)
AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an
instance of <class 'pyspark.sql.types.DataType'>
StructType([StructField(“处方药”,StringType),StructField([“疾病”,“身份”,“记录的症状”,“严重程度”),数组类型)])
替换为:
StructType([StructField(“处方药”,StringType()),StructField([“疾病”,“ID”,“记录的症状”,“严重程度”),ArrayType())))
您需要实例化该类。您似乎有一个名为
reduceColumns
的类,该类不接受任何输入参数,但您在print(rdd)
语句后给出了一个。数据类型-spark.createDataFrame(self,StructType([StructField])中可能缺少重复的()
(“处方药”、StringType())、StructField([“疾病”、“ID”、“记录的症状”、“严重程度”]、ArrayType(StringType()))))))
。也更改了ArrayType
check@RakeshKumar谢谢你…现在它真的有意义了。。。。。。