Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python pyspark对象作为数据帧-类型错误_Python_Dataframe_Apache Spark_Pyspark_Elmo - Fatal编程技术网

Python pyspark对象作为数据帧-类型错误

Python pyspark对象作为数据帧-类型错误,python,dataframe,apache-spark,pyspark,elmo,Python,Dataframe,Apache Spark,Pyspark,Elmo,编辑:已解决 我认为问题在于Elmo推理生成的多维数组。我平均了所有向量,然后使用句子中所有单词的最终平均向量作为输出,现在可以转换为数据帧。现在,我必须让它更快,我将检查使用线程回来 尝试使用遵循github的ElmoForManylans预训练模型为pyspark数据帧中的句子生成Elmo嵌入。但是,我无法将结果对象转换为数据帧 Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/sql/types

编辑已解决 我认为问题在于Elmo推理生成的多维数组。我平均了所有向量,然后使用句子中所有单词的最终平均向量作为输出,现在可以转换为数据帧。现在,我必须让它更快,我将检查使用线程回来

尝试使用遵循github的ElmoForManylans预训练模型为pyspark数据帧中的句子生成Elmo嵌入。但是,我无法将结果对象转换为数据帧

Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _infer_schema
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in _infer_schema
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in <listcomp>
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
...........
...........  
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 367, in <module>
    raise Exception(traceback.format_exc())
.........
.........
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 17, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 691, in createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>

0我的产品名称。。。0 [[0.1606223, 0.09298285, -0.3494971, 0.2... [1行x 3列]

wordsPd.dtypes

product_name    object 
description      object 
embeddings    object 
dtype: object
下面是创建数据帧的错误

Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _infer_schema
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in _infer_schema
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1096, in <listcomp>
    fields = [StructField(k, _infer_type(v), True) for k, v in items]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
...........
...........  
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 367, in <module>
    raise Exception(traceback.format_exc())
.........
.........
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
.........
.........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'object'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1068, in _infer_type
    return _infer_schema(obj)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.indexes.range.RangeIndex'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-7355529425587840217.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 17, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 691, in createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
........
........
........
........
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1070, in _infer_type
    raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <class 'pandas.core.series.Series'>
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1094行,在推断模式中
raise TypeError(“无法推断类型:%s”%type(行))的架构)
TypeError:无法推断类型的架构:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1096行,在推断模式中
fields=[k的StructField(k,_推断_类型(v),True),用于项中的k,v]
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1096行,在
fields=[k的StructField(k,_推断_类型(v),True),用于项中的k,v]
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
.........
.........
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”,第360行,in
exec(代码,ZCUUserQueryNameSpace)
...........
...........  
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”,第367行,in
引发异常(traceback.format_exc())
.........
.........
raise TypeError(“无法推断类型:%s”%type(行))的架构)
TypeError:无法推断类型的架构:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
.........
.........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1068行,在推断类型中
返回推断模式(obj)
........
........
........
........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-7355529425587840217.py”,第360行,in
exec(代码,ZCUUserQueryNameSpace)
文件“”,第17行,在
createDataFrame中的文件“/usr/lib/spark/python/pyspark/sql/session.py”,第691行
rdd,schema=self.\u createFromLocal(映射(准备,数据),schema)
........
........
........
........
文件“/usr/lib/spark/python/pyspark/sql/types.py”,第1070行,在推断类型中
raise TypeError(“不支持的类型:%s”%type(obj))
TypeError:不支持的类型:

我需要使用以下方法聚合向量,从而将多维数组合并为一个列表

for t in wordsPd.itertuples():
        new_list.append(np.average(np.array([np.average(x,axis=0) for x in e.sents2elmo(t[2])]), axis=0).tolist())
可能重复的