Python pyspark对象作为数据帧-类型错误
编辑:已解决 我认为问题在于Elmo推理生成的多维数组。我平均了所有向量,然后使用句子中所有单词的最终平均向量作为输出,现在可以转换为数据帧。现在,我必须让它更快,我将检查使用线程回来 尝试使用遵循github的ElmoForManylans预训练模型为pyspark数据帧中的句子生成Elmo嵌入。但是,我无法将结果对象转换为数据帧Python pyspark对象作为数据帧-类型错误,python,dataframe,apache-spark,pyspark,elmo,Python,Dataframe,Apache Spark,Pyspark,Elmo,编辑:已解决 我认为问题在于Elmo推理生成的多维数组。我平均了所有向量,然后使用句子中所有单词的最终平均向量作为输出,现在可以转换为数据帧。现在,我必须让它更快,我将检查使用线程回来 尝试使用遵循github的ElmoForManylans预训练模型为pyspark数据帧中的句子生成Elmo嵌入。但是,我无法将结果对象转换为数据帧 Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/sql/types
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _infer_schema
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in _infer_schema
fields = [StructField(k, _infer_type(v), True) for k, v in items]
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in <listcomp>
fields = [StructField(k, _infer_type(v), True) for k, v in items]
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
.........
.........
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
exec(code, _zcUserQueryNameSpace)
...........
...........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 367, in <module>
raise Exception(traceback.format_exc())
.........
.........
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
.........
.........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
........
........
........
........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 17, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 691, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
........
........
........
........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
0我的产品名称。。。0 [[0.1606223, 0.09298285, -0.3494971, 0.2...
[1行x 3列]
wordsPd.dtypes
product_name object
description object
embeddings object
dtype: object
下面是创建数据帧的错误
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _infer_schema
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in _infer_schema
fields = [StructField(k, _infer_type(v), True) for k, v in items]
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in <listcomp>
fields = [StructField(k, _infer_type(v), True) for k, v in items]
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
.........
.........
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
exec(code, _zcUserQueryNameSpace)
...........
...........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 367, in <module>
raise Exception(traceback.format_exc())
.........
.........
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
.........
.........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
return _infer_schema(obj)
........
........
........
........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 17, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 691, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
........
........
........
........
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1094行,在推断模式中
raise TypeError(“无法推断类型:%s”%type(行))的架构)
TypeError:无法推断类型的架构:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1096行,在推断模式中
fields=[k的StructField(k,_推断_类型(v),True),用于项中的k,v]
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1096行,在
fields=[k的StructField(k,_推断_类型(v),True),用于项中的k,v]
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
.........
.........
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”,第360行,in
exec(代码,ZCUUserQueryNameSpace)
...........
...........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”,第367行,in
引发异常(traceback.format_exc())
.........
.........
raise TypeError(“无法推断类型:%s”%type(行))的架构)
TypeError:无法推断类型的架构:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
.........
.........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
........
........
........
........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”,第360行,in
exec(代码,ZCUUserQueryNameSpace)
文件“”,第17行,在
createDataFrame中的文件“/usr/lib/spark/python/pyspark/sql/session.py”,第691行
rdd,schema=self.\u createFromLocal(映射(准备,数据),schema)
........
........
........
........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
我需要使用以下方法聚合向量,从而将多维数组合并为一个列表
for t in wordsPd.itertuples():
new_list.append(np.average(np.array([np.average(x,axis=0) for x in e.sents2elmo(t[2])]), axis=0).tolist())
可能重复的