Python 从合并列的代码中获取类别类型
我从数据帧的两列中创建了唯一的数字代码。现在,我想找到数字代码到原始值之间的对应映射 比如说,Python 从合并列的代码中获取类别类型,python,python-2.7,pandas,numpy,dataframe,Python,Python 2.7,Pandas,Numpy,Dataframe,我从数据帧的两列中创建了唯一的数字代码。现在,我想找到数字代码到原始值之间的对应映射 比如说, df = pd.DataFrame({"P1":["a","b","c","a"], "P2":["b","c","d","c"], "A":[3,4,5,6]}, index=[2,2,3,3]) print (df) A P1 P2 2 3 a b 2 4 b c 3 5 c d 3 6 a
df = pd.DataFrame({"P1":["a","b","c","a"],
"P2":["b","c","d","c"],
"A":[3,4,5,6]}, index=[2,2,3,3])
print (df)
A P1 P2
2 3 a b
2 4 b c
3 5 c d
3 6 a c
cols = ['P1','P2']
df[cols] = (pd.factorize(df[cols].values.ravel())[0]+1).reshape(-1, len(cols))
print (df)
A P1 P2
2 3 1 2
2 4 2 3
3 5 3 4
3 6 1 3
现在,我想把这个映射作为一个词汇
a => 1
b => 2
c => 3
d => 4
如何获取它?您可以使用索引从
factorize
,zip
展开第一个数组,并转换为dict
:
cols = ['P1','P2']
a = (pd.factorize(df[cols].values.ravel()))
d = dict(zip(a[1][a[0]], a[0]+1))
print (d)
{'d': 4, 'b': 2, 'c': 3, 'a': 1}
df[cols] = (a[0]+1).reshape(-1, len(cols))
print (df)
A P1 P2
2 3 1 2
2 4 2 3
3 5 3 4
3 6 1 3
详细信息:
print (a)
(array([0, 1, 1, 2, 2, 3, 0, 2], dtype=int64), array(['a', 'b', 'c', 'd'], dtype=object))
print (a[1][a[0]])
['a' 'b' 'b' 'c' 'c' 'd' 'a' 'c']
print (a[0] + 1)
[1 2 2 3 3 4 1 3]
建议:首先不要做所有疯狂的事情来转换数据帧。创建映射,然后应用它:
orig = pd.unique(df[cols].values.flatten())
code_map = dict(zip(orig, np.arange(orig.size)))
df[cols] = df[cols].applymap(code_map.__getitem__)
code_map # returns {'a': 0, 'b': 1, 'c': 2, 'd': 3}
df # returns
A P1 P2
2 3 a b
2 4 b c
3 5 c d
3 6 a c