Python pyspark对象作为数据帧-类型错误_Python_Dataframe_Apache Spark_Pyspark_Elmo

Python pyspark对象作为数据帧-类型错误

python dataframe apache-spark pyspark

Python pyspark对象作为数据帧-类型错误,python,dataframe,apache-spark,pyspark,elmo,Python,Dataframe,Apache Spark,Pyspark,Elmo,编辑：已解决我认为问题在于Elmo推理生成的多维数组。我平均了所有向量，然后使用句子中所有单词的最终平均向量作为输出，现在可以转换为数据帧。现在，我必须让它更快，我将检查使用线程回来尝试使用遵循github的ElmoForManylans预训练模型为pyspark数据帧中的句子生成Elmo嵌入。但是，我无法将结果对象转换为数据帧 Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/sql/types

编辑：已解决 我认为问题在于Elmo推理生成的多维数组。我平均了所有向量，然后使用句子中所有单词的最终平均向量作为输出，现在可以转换为数据帧。现在，我必须让它更快，我将检查使用线程回来

尝试使用遵循github的ElmoForManylans预训练模型为pyspark数据帧中的句子生成Elmo嵌入。但是，我无法将结果对象转换为数据帧

Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _infer_schema
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in _infer_schema
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in <listcomp>
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
...........
...........  
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 367, in <module>
    raise Exception(traceback.format_exc())
.........
.........
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 17, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 691, in createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>

0我的产品名称。。。0 [[0.1606223, 0.09298285, -0.3494971, 0.2... [1行x 3列]

wordsPd.dtypes

product_name    object 
description      object 
embeddings    object 
dtype: object

下面是创建数据帧的错误

Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _infer_schema
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in _infer_schema
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in <listcomp>
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
...........
...........  
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 367, in <module>
    raise Exception(traceback.format_exc())
.........
.........
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 17, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 691, in createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>

回溯（最近一次呼叫最后一次）：
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1068行，在推断类型中
返回推断模式（obj）
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1094行，在推断模式中
raise TypeError（“无法推断类型：%s”%type（行））的架构）
TypeError:无法推断类型的架构：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1068行，在推断类型中
返回推断模式（obj）
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1096行，在推断模式中
fields=[k的StructField（k，_推断_类型（v），True），用于项中的k，v]
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1096行，在
fields=[k的StructField（k，_推断_类型（v），True），用于项中的k，v]
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1070行，在推断类型中
raise TypeError（“不支持的类型：%s”%type（obj））
TypeError:不支持的类型：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1068行，在推断类型中
返回推断模式（obj）
.........
.........
raise TypeError（“不支持的类型：%s”%type（obj））
TypeError:不支持的类型：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”，第360行，in
exec（代码，ZCUUserQueryNameSpace）
...........
...........  
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1070行，在推断类型中
raise TypeError（“不支持的类型：%s”%type（obj））
TypeError:不支持的类型：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”，第367行，in
引发异常（traceback.format_exc（））
.........
.........
raise TypeError（“无法推断类型：%s”%type（行））的架构）
TypeError:无法推断类型的架构：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1068行，在推断类型中
返回推断模式（obj）
.........
.........
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1070行，在推断类型中
raise TypeError（“不支持的类型：%s”%type（obj））
TypeError:不支持的类型：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1068行，在推断类型中
返回推断模式（obj）
........
........
........
........
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1070行，在推断类型中
raise TypeError（“不支持的类型：%s”%type（obj））
TypeError:不支持的类型：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”，第360行，in
exec（代码，ZCUUserQueryNameSpace）
文件“”，第17行，在
createDataFrame中的文件“/usr/lib/spark/python/pyspark/sql/session.py”，第691行
rdd，schema=self.\u createFromLocal（映射（准备，数据），schema）
........
........
........
........
文件“/usr/lib/spark/python/pyspark/sql/types.py”，第1070行，在推断类型中
raise TypeError（“不支持的类型：%s”%type（obj））
TypeError:不支持的类型：

我需要使用以下方法聚合向量，从而将多维数组合并为一个列表

for t in wordsPd.itertuples():
        new_list.append(np.average(np.array([np.average(x,axis=0) for x in e.sents2elmo(t[2])]), axis=0).tolist())

可能重复的