Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pyspark错误:-数据类型<;类别';pyspark.sql.types.StringType'&燃气轮机;应该是<;类别';pyspark.sql.types.DataType'&燃气轮机;_Python_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Python Pyspark错误:-数据类型<;类别';pyspark.sql.types.StringType'&燃气轮机;应该是<;类别';pyspark.sql.types.DataType'&燃气轮机;

Python Pyspark错误:-数据类型<;类别';pyspark.sql.types.StringType'&燃气轮机;应该是<;类别';pyspark.sql.types.DataType'&燃气轮机;,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,我需要从pipelinedRDD中提取一些数据,但在将其转换为数据帧时,会出现以下错误: Traceback (most recent call last): File "/home/karan/Desktop/meds.py", line 42, in <module> relevantToSymEntered(newrdd) File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntere

我需要从pipelinedRDD中提取一些数据,但在将其转换为数据帧时,会出现以下错误:

Traceback (most recent call last):

  File "/home/karan/Desktop/meds.py", line 42, in <module>

    relevantToSymEntered(newrdd)

  File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered

    mat = spark.createDataFrame(self,StructType([StructField("Prescribed 

medicine",StringType), StructField(["Disease","ID","Symptoms 

Recorded","Severeness"],ArrayType)]))

  File "/home/karan/Downloads/spark-2.4.2-bin-

hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__

    "dataType %s should be an instance of %s" % (dataType, DataType)

AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an 
instance of <class 'pyspark.sql.types.DataType'>
StructType([StructField(“处方药”,StringType),StructField([“疾病”,“身份”,“记录的症状”,“严重程度”),数组类型)])

替换为:

StructType([StructField(“处方药”,StringType()),StructField([“疾病”,“ID”,“记录的症状”,“严重程度”),ArrayType())))


您需要实例化该类。

您似乎有一个名为
reduceColumns
的类,该类不接受任何输入参数,但您在
print(rdd)
语句后给出了一个。数据类型-
spark.createDataFrame(self,StructType([StructField])中可能缺少重复的
()
(“处方药”、StringType())、StructField([“疾病”、“ID”、“记录的症状”、“严重程度”]、ArrayType(StringType()))))))
。也更改了
ArrayType
check@RakeshKumar谢谢你…现在它真的有意义了。。。。。。