Python 转换具有numpy数组的列将其转换为以dtype作为对象的numpy数组
我在数据帧中有一个列,它的numpy数组长度为10。我的数据帧如下所示:Python 转换具有numpy数组的列将其转换为以dtype作为对象的numpy数组,python,python-3.x,pandas,numpy,Python,Python 3.x,Pandas,Numpy,我在数据帧中有一个列,它的numpy数组长度为10。我的数据帧如下所示: 0 [2.0, 1246.0, 82.0, 43.0, 569.0, 46.0, 424.0, ... 1 [395.0, 2052.0, 1388.0, 8326.0, 5257.0, 176.0,... 10 [4.0, 1.0, 13.0, 1409.0, 7742.0, 259.0, 1856.0... 100 [4.0, 87.0, 1595.0, 706.0, 2935
0 [2.0, 1246.0, 82.0, 43.0, 569.0, 46.0, 424.0, ...
1 [395.0, 2052.0, 1388.0, 8326.0, 5257.0, 176.0,...
10 [4.0, 1.0, 13.0, 1409.0, 7742.0, 259.0, 1856.0...
100 [4.0, 87.0, 1595.0, 706.0, 2935.0, 6028.0, 442...
1000 [45.0, 582.0, 124.0, 6530.0, 6548.0, 748.0, 61...
Name: embedding1, dtype: object
array([array([ 2., 1246., 82., 43., 569., 46., 424., 446., 1054., 39.]),
array([4.0000e+00, 1.0000e+00, 1.3000e+01, 1.4090e+03, 7.7420e+03,
2.5900e+02, 1.8560e+03, 3.6181e+04, 4.2000e+01, 8.9000e+02]),
...,
array([4.000e+00, 1.000e+00, 1.300e+01, 2.900e+01, 4.930e+02, 2.760e+02,1.100e+01, 6.770e+02, 6.740e+02, 5.806e+03]),], dtype=object)
当我使用以下命令将其转换为数组的numpy数组时:
input = np.asarray(df.tolist())
input1 = np.asarray(df1.tolist(),dtype=np.float)
它给出的数组如下所示:
0 [2.0, 1246.0, 82.0, 43.0, 569.0, 46.0, 424.0, ...
1 [395.0, 2052.0, 1388.0, 8326.0, 5257.0, 176.0,...
10 [4.0, 1.0, 13.0, 1409.0, 7742.0, 259.0, 1856.0...
100 [4.0, 87.0, 1595.0, 706.0, 2935.0, 6028.0, 442...
1000 [45.0, 582.0, 124.0, 6530.0, 6548.0, 748.0, 61...
Name: embedding1, dtype: object
array([array([ 2., 1246., 82., 43., 569., 46., 424., 446., 1054., 39.]),
array([4.0000e+00, 1.0000e+00, 1.3000e+01, 1.4090e+03, 7.7420e+03,
2.5900e+02, 1.8560e+03, 3.6181e+04, 4.2000e+01, 8.9000e+02]),
...,
array([4.000e+00, 1.000e+00, 1.300e+01, 2.900e+01, 4.930e+02, 2.760e+02,1.100e+01, 6.770e+02, 6.740e+02, 5.806e+03]),], dtype=object)
它给出的类型是object。我希望对象是浮动的,因为它给出了形状(1000,),但我希望形状是(1000,10)。我试过使用这个:
input = np.asarray(df.tolist())
input1 = np.asarray(df1.tolist(),dtype=np.float)
但它给出了以下错误:
ValueError: setting an array element with a sequence.
如何解决这个问题
PS:dataframe的row numpy数组的所有元素都是浮点类型的首先,看起来您有一个
pd.Series
而不是一个数据帧
以设置为例:
x = [[2.0, 1246.0, 82.0, 43.0, 569.0, 46.0, 424.0],
[395.0, 2052.0, 1388.0, 8326.0, 5257.0, 176.0],
[4.0, 1.0, 13.0, 1409.0, 7742.0, 259.0, 1856.0],
[4.0, 87.0, 1595.0, 706.0, 2935.0, 6028.0, 442],
[45.0, 582.0, 124.0, 6530.0, 6548.0, 748.0, 61]]
s = pd.Series(x)
产生
0 [2.0, 1246.0, 82.0, 43.0, 569.0, 46.0, 424.0]
1 [395.0, 2052.0, 1388.0, 8326.0, 5257.0, 176.0]
2 [4.0, 1.0, 13.0, 1409.0, 7742.0, 259.0, 1856.0]
3 [4.0, 87.0, 1595.0, 706.0, 2935.0, 6028.0, 442]
4 [45.0, 582.0, 124.0, 6530.0, 6548.0, 748.0, 61]
dtype: object
您有一个pd.Series
数组。看起来你想把它弄平。在列表列表中使用默认构造函数会生成一个数据帧,其中每个列表都被解释为一行:
df2 = pd.DataFrame(s.tolist())
0 1 2 3 4 5 6
0 2.0 1246.0 82.0 43.0 569.0 46.0 424.0
1 395.0 2052.0 1388.0 8326.0 5257.0 176.0 NaN
2 4.0 1.0 13.0 1409.0 7742.0 259.0 1856.0
3 4.0 87.0 1595.0 706.0 2935.0 6028.0 442.0
4 45.0 582.0 124.0 6530.0 6548.0 748.0 61.0
现在,您只需获取访问数据帧的底层np.array
。value
df2.values
array([[2.000e+00, 1.246e+03, 8.200e+01, 4.300e+01, 5.690e+02, 4.600e+01,
4.240e+02],
[3.950e+02, 2.052e+03, 1.388e+03, 8.326e+03, 5.257e+03, 1.760e+02,
nan],
[4.000e+00, 1.000e+00, 1.300e+01, 1.409e+03, 7.742e+03, 2.590e+02,
1.856e+03],
[4.000e+00, 8.700e+01, 1.595e+03, 7.060e+02, 2.935e+03, 6.028e+03,
4.420e+02],
[4.500e+01, 5.820e+02, 1.240e+02, 6.530e+03, 6.548e+03, 7.480e+02,
6.100e+01]])
您有一个数组。内部数组是
dtype
float,但是外部数组-保存所有float-dtype数组对象的数组-必须是dtype
object您没有指定足够的内容让我们知道,但是如果让我猜,我想说你只需要df.values
@RafaelC我对这个问题做了一些编辑来解释。@RafaelC df.values给出了同样的答案output@RafaelC成功了。你能在回答中详细解释一下吗。