查询cassandra以转换为数据帧时出现Pyspark错误

查询cassandra以转换为数据帧时出现Pyspark错误,cassandra,pyspark,pyspark-sql,Cassandra,Pyspark,Pyspark Sql,我在执行命令时遇到以下错误: user = sc.cassandraTable("DB NAME", "TABLE NAME").toDF() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py", line 60, in toDF

我在执行命令时遇到以下错误:

user = sc.cassandraTable("DB NAME", "TABLE NAME").toDF()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py", line 60, in toDF
    return sqlContext.createDataFrame(self, schema, sampleRatio)
  File "/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py", line 333, in createDataFrame
    schema = self._inferSchema(rdd, samplingRatio)
  File "/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py", line 220, in _inferSchema
    raise ValueError("Some of types cannot be determined by the "
ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling
user=sc.cassandraTable(“数据库名”、“表名”).toDF()
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py”,第60行,在toDF中
返回sqlContext.createDataFrame(self、schema、sampleRatio)
createDataFrame中的文件“/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py”,第333行
schema=self.\u推断模式(rdd,采样)
文件“/usr/local/src/spark/spark-1.4.1/python/pyspark/sql/context.py”,第220行,在
raise VALUERROR(“某些类型无法由
ValueError:某些类型无法由前100行确定,请使用采样重试

直接加载到数据帧中,这也将避免任何用于解释类型的python级代码

sqlContext.read.format("org.apache.spark.sql.cassandra").options(keyspace="ks",table="tb").load()

在进入pyspark的源代码级别后,我发现错误是因为在创建数据帧时,pyspark检查前100个RDD中的任何字段是否具有连续的空值,如果它为特定字段的100条记录连续找到空值,则抛出此错误。