Pyspark Sql类型:联合[int,float]

Pyspark Sql类型:联合[int,float],pyspark,apache-spark-sql,pyspark-sql,Pyspark,Apache Spark Sql,Pyspark Sql,我接收的数据类型通常是int,但也可能是None或inf,并用它创建一个Spark数据帧。PySpark抱怨说,因为inf是一个浮点数,所以我尝试将其设置为LongType: File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 177, in main process() File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1

我接收的数据类型通常是
int
,但也可能是
None
inf
,并用它创建一个Spark数据帧。PySpark抱怨说,因为inf是一个浮点数,所以我尝试将其设置为
LongType

  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 177, in main
    process()
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 172, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 268, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/opt/spark/python/pyspark/sql/session.py", line 567, in prepare
    verify_func(obj, schema)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1355, in _verify_type
    _verify_type(obj.get(f.name), f.dataType, f.nullable)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1329, in _verify_type
    raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj)))
TypeError: LongType can not accept object inf in type <class 'float'>
文件“/opt/spark/python/lib/pyspark.zip/pyspark/worker.py”,第177行,主视图
过程()
文件“/opt/spark/python/lib/pyspark.zip/pyspark/worker.py”,第172行,正在处理中
serializer.dump_流(func(拆分索引,迭代器),outfile)
文件“/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py”,第268行,在dump_流中
vs=列表(itertools.islice(迭代器,批处理))
文件“/opt/spark/python/pyspark/sql/session.py”,第567行,在prepare中
验证函数(对象,模式)
文件“/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py”,第1355行,类型为
_验证_类型(obj.get(f.name)、f.dataType、f.nullable)
文件“/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py”,第1329行,类型为
raise TypeError(“%s”不能接受类型%s“%中的对象%r(数据类型,obj,类型(obj)))
TypeError:LongType无法接受类型中的对象inf

如何在pyspark.sql.types中支持这一点?

目前,我只是将字段映射到一个float,并在模式中使用了DoubleType:

def convert_字段(x):
尝试:
field=x.pop(“fieldName”)
除KeyError外:
返回x
返回dict(字段名=浮点(字段),如果字段不是None else字段,**x)
结果=。。。
createDataFrame(results.map(convert\u字段)、results\u schema.cache