Python ValueError:无法将字符串转换为浮点：'࿽࿽࿽'；_Python_Numpy_Pyspark_Python Unicode

Python ValueError:无法将字符串转换为浮点：'࿽࿽࿽'；

python numpy pyspark

Python ValueError:无法将字符串转换为浮点：'࿽࿽࿽'；,python,numpy,pyspark,python-unicode,Python,Numpy,Pyspark,Python Unicode,我有一个（2M，23）维numpy数组X。它的数据类型为>X 数组（['143347'，'1325'，'28.19148936'，'61'，'0'，'0']， ['50905', '0', '0', ..., '110', '0', '0'], ['143899', '1325', '28.80434783', ..., '61', '0', '0'], ..., ['85', '0', '0', ..., '1980', '0', '0'], ['233', '54', '27', ...,

我有一个（2M，23）维

numpy

数组

。它的数据类型为

>X
数组（['143347'，'1325'，'28.19148936'，'61'，'0'，'0']，
['50905', '0', '0', ..., '110', '0', '0'],
['143899', '1325', '28.80434783', ..., '61', '0', '0'],
...,
['85', '0', '0', ..., '1980', '0', '0'],
['233', '54', '27', ..., '-1', '0', '0'],
['���', '�', '�����', ..., '�', '��', '���']], dtype='表示该字符串(���) 维度在图中不是固定的，在运行调用之间可能会有所不同
问号符号表示tf.TensorShape
Session.run或eval返回的任何张量都是NumPy数组
>>> print(type(tf.Session().run(tf.constant([1,2,3]))))
<class 'numpy.ndarray'>

打印（类型（tf.Session（）.run（tf.constant（[1,2,3]））

或：

sess=tf.InteractiveSession（） >>>打印（类型（tf.常量（[1,2,3]）.eval（））或者，相当于：

>>> sess = tf.Session()
>>> with sess.as_default():
>>>    print(type(tf.constant([1,2,3]).eval()))
<class 'numpy.ndarray'>

sess=tf.Session（） >>>使用sess.as_default（）： >>>打印（类型（tf.常量（[1,2,3]）.eval（））非Session.run或eval（）返回的任何张量都是一个NumPy数组。例如，稀疏张量作为SparseTensorValue返回：

>>> print(type(tf.Session().run(tf.SparseTensor([[0, 0]],[1],[1,2]))))
<class 'tensorflow.python.framework.sparse_tensor.SparseTensorValue'>

打印（键入（tf.Session（）.run（tf.SparseTensor（[[0,0]]，[1]，[1,2]））表示字符串(��) 维度在图中不是固定的，在运行调用之间可能会有所不同问号符号表示

tf.TensorShape

Session.run或eval返回的任何张量都是NumPy数组

>>> print(type(tf.Session().run(tf.constant([1,2,3]))))
<class 'numpy.ndarray'>

打印（类型（tf.Session（）.run（tf.constant（[1,2,3]））或：

sess=tf.InteractiveSession（） >>>打印（类型（tf.常量（[1,2,3]）.eval（））或者，相当于：

>>> sess = tf.Session()
>>> with sess.as_default():
>>>    print(type(tf.constant([1,2,3]).eval()))
<class 'numpy.ndarray'>

>>> print(type(tf.Session().run(tf.SparseTensor([[0, 0]],[1],[1,2]))))
<class 'tensorflow.python.framework.sparse_tensor.SparseTensorValue'>

打印（键入（tf.Session（）.run（tf.SparseTensor（[[0,0]]，[1]，[1,2]））

虽然不是最好的解决方案，但通过将其转换为pandas dataframe并继续工作，我获得了一些成功

代码片段输入

虽然不是最好的解决方案，但通过将其转换为pandas dataframe并继续工作，我获得了一些成功

代码片段输入

如何读取这些数据？

�

是Unicode替换字符，在使用错误的代码页读取ASCII文本时使用。看起来源包含使用错误的代码页读取的非数字数据。即使使用了正确的代码页，文本仍然无效如果您是Python解释器，您将如何转换

���'

浮点数？那代表哪个数字？你想要的结果是什么？@PanagiotisKanavos:我使用

collect（）

方法从pyspark数据帧中读取它。@zvone:没错！我希望我知道URC（？？？）之前是什么。想要的结果是浮点数数组。你如何读取这些数据？

�

���'

浮点数？那代表哪个数字？你想要的结果是什么？@PanagiotisKanavos:我使用

collect（）

方法从pyspark数据帧中读取它。@zvone:没错！我希望我知道URC（？？？）之前是什么。想要的结果是浮点数数组。

X = np_dfr[:,0:22]
Y = np_dfr[:,-1]

>> X
array([['143347', '1325', '28.19148936', ..., '61', '0', '0'],
       ['50905', '0', '0', ..., '110', '0', '0'],
       ['143899', '1325', '28.80434783', ..., '61', '0', '0'],
       ...,
       ['85', '0', '0', ..., '1980', '0', '0'],
       ['233', '54', '27', ..., '-1', '0', '0'],
       ['���', '�', '�����', ..., '�', '��', '���']], dtype='<U26')

>>> print(type(tf.Session().run(tf.constant([1,2,3]))))
<class 'numpy.ndarray'>

>>> sess = tf.InteractiveSession()
>>> print(type(tf.constant([1,2,3]).eval()))
<class 'numpy.ndarray'>

>>> sess = tf.Session()
>>> with sess.as_default():
>>>    print(type(tf.constant([1,2,3]).eval()))
<class 'numpy.ndarray'>

>>> print(type(tf.Session().run(tf.SparseTensor([[0, 0]],[1],[1,2]))))
<class 'tensorflow.python.framework.sparse_tensor.SparseTensorValue'>

# convert X into dataframe
X_pd = pd.DataFrame(data=X)
# replace all instances of URC with 0 
X_replace = X_pd.replace('�',0, regex=True)
# convert it back to numpy array
X_np = X_replace.values
# set the object type as float
X_fa = X_np.astype(float)

array([['85', '0', '0', '1980', '0', '0'],
       ['233', '54', '27', '-1', '0', '0'],
       ['���', '�', '�����', '�', '��', '���']], dtype='<U5')

array([[ 8.50e+01,  0.00e+00,  0.00e+00,  1.98e+03,  0.00e+00,  0.00e+00],
       [ 2.33e+02,  5.40e+01,  2.70e+01, -1.00e+00,  0.00e+00,  0.00e+00],
       [ 0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00]])