Pyspark '；管道DRDD'；对象没有属性'_获取对象id'；_Pyspark_Apache Spark Sql_Jupyter_Azure Hdinsight

Pyspark '；管道DRDD'；对象没有属性'_获取对象id'；

pyspark

Pyspark '；管道DRDD'；对象没有属性'_获取对象id'；,pyspark,apache-spark-sql,jupyter,azure-hdinsight,Pyspark,Apache Spark Sql,Jupyter,Azure Hdinsight,我在尝试复制我在这里看到的示例时遇到了一个问题- 当涉及到：hvacTable=sqlContext.createDataFrame（hvac）它返回的错误是： 'PipelinedRDD' object has no attribute '_get_object_id' Traceback (most recent call last): File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 3

我在尝试复制我在这里看到的示例时遇到了一个问题-

当涉及到：

hvacTable=sqlContext.createDataFrame（hvac）

它返回的错误是：

'PipelinedRDD' object has no attribute '_get_object_id'
Traceback (most recent call last):
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 333, in createDataFrame
    return self.sparkSession.createDataFrame(data, schema, samplingRatio, verifySchema)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1124, in __call__
    args_command, temp_args = self._build_args(*args)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1094, in _build_args
    [get_command_part(arg, self.pool) for arg in new_args])
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 289, in get_command_part
    command_part = REFERENCE_TYPE + parameter._get_object_id()
AttributeError: 'PipelinedRDD' object has no attribute '_get_object_id'

我以T为例，它是Jupyter的pyspark笔记本

为什么会发生此错误？

您可能在较新的群集上运行它。请将“sqlContext”更新为“spark”以使其正常工作。我们也会更新这篇文档文章

同样在Spark 2.x中，您现在可以使用更简单的数据帧执行此操作。可以使用以下等效项替换创建hvac表的代码段：

csvFile = spark.read.csv('wasb:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv', header=True, inferSchema=True)
csvFile.write.saveAsTable("hvac")

试试这个：

hvac.take（1）

。输出结果是什么？谢谢你，史蒂文，我正要下班，所以明天我会试试你的建议，然后用输出结果回复。亲切的问候。