Pandas 如何从字符串列生成Category的DataFrame列？_Pandas_Categorical Data

Pandas 如何从字符串列生成Category的DataFrame列？

pandas

Pandas 如何从字符串列生成Category的DataFrame列？,pandas,categorical-data,Pandas,Categorical Data,我可以将pandas字符串列转换为Category列，但当我尝试将其作为新的DataFrame列插入时，它似乎会被转换回str系列： train['LocationNFactor'] = pd.Categorical.from_array(train['LocationNormalized']) >>> type(pd.Categorical.from_array(train['LocationNormalized'])) <class 'pandas.core.cat

我可以将pandas字符串列转换为Category列，但当我尝试将其作为新的DataFrame列插入时，它似乎会被转换回str系列：

train['LocationNFactor'] = pd.Categorical.from_array(train['LocationNormalized'])

>>> type(pd.Categorical.from_array(train['LocationNormalized']))
<class 'pandas.core.categorical.Categorical'>
# however it got converted back to...
>>> type(train['LocationNFactor'][2])
<type 'str'>
>>> train['LocationNFactor'][2]
'Hampshire'

train['LocationNFactor']=pd.Categorical.from_数组（train['LocationNormalized']）
>>>类型（pd.Categorical.from_数组（train['LocationNormalized']））
#然而，它被转换回。。。
>>>类型（列车['LocationNFactor'][2]）
>>>列车['LocationNFactor'][2]
“汉普郡”

猜测这是因为Category没有映射到任何numpy数据类型；那么我是否必须将其转换为某种int类型，从而丢失因子标签级别关联？存储levelslabels关联并保留转换回的能力，最优雅的解决方法是什么？（只需像dict一样存储，并在需要时手动转换？）我想，不像R

（使用pandas 0.10.1、numpy 1.6.2、python 2.7.3——所有内容的最新macports版本）。

标签级别存储在索引对象中

将整数数组转换为字符串数组的步骤：索引[整数数组]
将字符串数组转换为整数数组的步骤：index.get\u indexer（字符串数组）

以下是一些例子：

In [56]:

c = pd.Categorical.from_array(['a', 'b', 'c', 'd', 'e'])

idx = c.levels

In [57]:

idx[[1,2,1,2,3]]

Out[57]:

Index([b, c, b, c, d], dtype=object)

In [58]:

idx.get_indexer(["a","c","d","e","a"])

Out[58]:

array([0, 2, 3, 4, 0])

我发现的0.15之前熊猫唯一的解决方法如下：

列必须转换为分类器的分类，但numpy将立即强制级别返回int，从而丢失因子信息

因此，将因子存储在数据帧外部的全局变量中

[更新：pandas]
我知道这一点，但这里的问题是，当我们分配给一个数据帧列时，所有的数据都会返回到str，如我所示：
train['LocationNFactor']=pd.Categorical…
类似于中。如何重写我的代码？
train_LocationNFactor = pd.Categorical.from_array(train['LocationNormalized']) # default order: alphabetical train['LocationNFactor'] = train_LocationNFactor.labels # insert in dataframe