Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将数据分为3类的最佳方法_Python_Python 2.7_Python 3.x_Numpy_Pandas - Fatal编程技术网

Python 将数据分为3类的最佳方法

Python 将数据分为3类的最佳方法,python,python-2.7,python-3.x,numpy,pandas,Python,Python 2.7,Python 3.x,Numpy,Pandas,我有一个numpy数组作为 [['6.5' '3.2' '5.1' '2.0' 'Iris-virginica'] ['6.1' '2.8' '4.0' '1.3' 'Iris-versicolor'] ['4.6' '3.2' '1.4' '0.2' 'Iris-setosa'] ['6.0' '2.2' '4.0' '1.0' 'Iris-versicolor'] ['4.7' '3.2' '1.3' '0.2' 'Iris-setosa'] ['6.7' '3.1' '5.6' '2.

我有一个numpy数组作为

[['6.5' '3.2' '5.1' '2.0' 'Iris-virginica'] 
['6.1' '2.8' '4.0' '1.3' 'Iris-versicolor'] 
['4.6' '3.2' '1.4' '0.2' 'Iris-setosa']
['6.0' '2.2' '4.0' '1.0' 'Iris-versicolor']
['4.7' '3.2' '1.3' '0.2' 'Iris-setosa']
['6.7' '3.1' '5.6' '2.4' 'Iris-virginica']]
根据标签“
Iris-virginica”
“Iris-setosa”和
“Iris-virginica”
将这些数据分成3个独立的numpy数组的最快方法是什么

Iris virginica
数组仅包含
[[6.5''3.2''5.1''2.0'][6.7''3.1''5.6''2.4']

Iris setosa
数组只包含
['4.6''3.2''1.4''0.2']['4.7''3.2''1.3''0.2']


Iris versicolor
数组只包含
['6.1''2.8''4.0''1.3']['6.0''2.2''4.0''1.0']
使用
numpy
和列表
理解

import numpy as np

data = [['6.5', '3.2', '5.1', '2.0', 'Iris-virginica'],
['6.1', '2.8', '4.0', '1.3', 'Iris-versicolor'] ,
['4.6', '3.2', '1.4', '0.2', 'Iris-setosa'],
['6.0', '2.2', '4.0', '1.0', 'Iris-versicolor'],
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'],
['6.7', '3.1', '5.6', '2.4', 'Iris-virginica']]

filtered = [map(float, item[:4]) for item in data if item[4] == 'Iris-virginica']
print 'mean', np.mean(filtered, axis=0)
print 'var ', np.var(filtered, axis=0)
其中,
item[4]==“Iris virginica”
过滤你想要的,并且
map(float,item[:3])
用于
str
float
,然后
np.mean(…,axis=0)
用于获得过滤数据的
mean

输出是

mean [ 6.6   3.15  5.35]
var  [ 0.01    0.0025  0.0625]

更新

这里是
numpy
唯一的版本,但这似乎比上面的版本慢

data = np.array(data)
filtered = data[data[:, 4] == 'Iris-virginica'][:, :3].astype(np.float)
print 'mean', np.mean(filtered, axis=0)
print 'var ', np.var(filtered, axis=0)
timeit
结果为

In [5]: %timeit filtered = [map(float, item[:4]) for item in data if item[4] == 'Iris-virginica']
100000 loops, best of 3: 1.93 µs per loop

In [6]: data = np.array(data)

In [7]: timeit data[data[:, 4] == 'Iris-virginica'][:, :4].astype(np.float)
100000 loops, best of 3: 15.5 µs per loop

不清楚你在问什么。您有两个不同值的重复“类”。另外,您希望协方差如何计算?关于其他每一个“类”?尾波课程的同行?@Ffisegydd:我改变了问题。FWIW这会更简单地使用;您基本上想要
groupby
+
mean
IIUC。谢谢您的帮助。我稍微改变了我的问题:我想你想要的是
[:4]
,而不是
[:3]
,或者你跳过了一列数据。