Python 在pandas 0.10.1上使用pandas.read_csv指定数据类型float32_Python_Pandas_Numpy

Python 在pandas 0.10.1上使用pandas.read_csv指定数据类型float32

python pandas numpy

Python 在pandas 0.10.1上使用pandas.read_csv指定数据类型float32,python,pandas,numpy,Python,Pandas,Numpy,我正在尝试使用pandasread\u csv方法读取一个简单的空格分隔文件。然而，熊猫似乎没有遵守我的dtype参数。也许我没有正确地指定它我将对read\u csv的有点复杂的调用浓缩到这个简单的测试用例中。实际上，我在“真实”场景中使用了converters参数，但为了简单起见，我删除了这个参数以下是我的ipython课程： >>> cat test.out a b 0.76398 0.81394 0.32136 0.91063 >>> import

我正在尝试使用pandas

read\u csv

方法读取一个简单的空格分隔文件。然而，熊猫似乎没有遵守我的

dtype

参数。也许我没有正确地指定它

我将对

read\u csv

的有点复杂的调用浓缩到这个简单的测试用例中。实际上，我在“真实”场景中使用了

converters

参数，但为了简单起见，我删除了这个参数

以下是我的ipython课程：

>>> cat test.out
a b
0.76398 0.81394
0.32136 0.91063
>>> import pandas
>>> import numpy
>>> x = pandas.read_csv('test.out', dtype={'a': numpy.float32}, delim_whitespace=True)
>>> x
         a        b
0  0.76398  0.81394
1  0.32136  0.91063
>>> x.a.dtype
dtype('float64')

我也用

numpy.int32

或

numpy.int64

的

dtype

尝试了这个方法。这些选择会导致异常：

AttributeError: 'NoneType' object has no attribute 'dtype'

我假设

AttributeError

是因为pandas不会自动尝试将浮点值转换/截断为整数

我在32位机器上运行，带有32位版本的Python

>>> !uname -a
Linux ubuntu 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011 i686 i686 i386 GNU/Linux
>>> import platform
>>> platform.architecture()
('32bit', 'ELF')
>>> pandas.__version__
'0.10.1'

0.10.1实际上不太支持float32

看到这个了吗

您可以在0.11中这样做：

# dont' use dtype converters explicity for the columns you care about
# they will be converted to float64 if possible, or object if they cannot
df = pd.read_csv('test.csv'.....)

#### this is optional and related to the issue you posted ####
# force anything that is not a numeric to nan
# columns are the list of columns that you are interesetd in
df[columns] = df[columns].convert_objects(convert_numeric=True)


    # astype
    df[columns] = df[columns].astype('float32')

see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion

Its not as efficient as doing it directly in read_csv (but that requires
 some low-level changes)

我已经证实，在0.11-dev中，这是可行的（在32位和64位上，结果是相同的）

在熊猫0.10.1下，上面的内容对我来说很好

我认为这看起来类似于…@而且我认为你是对的。

AttributeError

问题正是github问题所提到的。但是，在我的另一个场景中，这些值是浮点值，但是当我尝试使用float32而不是float64等时，pandas不遵守

dtype

参数。仅供参考，这是就地的（隐式的），对于非浮点值不安全data@Jeff是的，这是就地转换，对于非浮点值

df=pd.read\u csv不安全（'sample.out'，converters={'a'：lambda x:pd.np.float32（x）}，delim_whitespace=True）

似乎也不起作用。我喜欢这种方法，这样在内存和速度上会更好一些。但是，如果使用

convert\u numeric=True

参数，

convert\u objects

将设置NaN。如果转换无法完成，这种方法可能会引发一些异常或其他问题。但是，我没有研究这方面的细节太多了。这就是convert\u numeric=True的要点，要从其他数字列中删除“讨厌的”值，

astype

或

convert\u objects

是最好的方法吗？如果需要指定的数据类型，那么使用astype，convert\u objects更适合从对象数据类型转换（并且不像以前的版本那样必要）那么这被认为是熊猫中的一个bug吗？我可以传入

dtype

，却得不到我要求的内容或错误等，这似乎有点欺骗。请参阅我的答案，在0.10.1+1中为

找到一个bug。convert\u对象（convert\u numeric=True）

，解决了我的问题，即拥有一个混合数据类型的数据帧，并希望其中一些被解析为浮点数。

In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True)

In [6]: x
Out[6]: 
         a        b
0  0.76398  0.81394
1  0.32136  0.91063

In [7]: x.dtypes
Out[7]: 
a    float32
b    float64
dtype: object

In [8]: pd.__version__
Out[8]: '0.11.0.dev-385ff82'

In [9]: quit()
vagrant@precise32:~/pandas$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux

In [22]: df.a.dtype = pd.np.float32

In [23]: df.a.dtype
Out[23]: dtype('float32')