Python 按间隔在每个单元格上创建带有标签的矩阵
我有用于填写观察矩阵的箱子和数据:Python 按间隔在每个单元格上创建带有标签的矩阵,python,python-3.x,pandas,numpy,Python,Python 3.x,Pandas,Numpy,我有用于填写观察矩阵的箱子和数据: a = array([0., 14., 29., 43., 58., 72., 86., 101., 115., 130., 144.]) b = array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141]) 我期望的结果是: 0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 130-144 10 1 0 0
a = array([0., 14., 29., 43., 58., 72., 86., 101., 115., 130., 144.])
b = array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141])
我期望的结果是:
0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 130-144
10 1 0 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0 0
36 0 0 1 0 0 0 0 0 0 0
48 0 0 0 1 0 0 0 0 0 0
64 0 0 0 0 1 0 0 0 0 0
71 0 0 0 0 1 0 0 0 0 0
91 0 0 0 0 0 0 1 0 0 0
切割+获取虚拟对象
这里有一个方法:
import numpy as np
import pandas as pd
a = np.array([0., 14., 29., 43., 58., 72., 86., 101., 115., 130., 144.])
b = np.array([10, 26, 36, 48, 64, 71, 91, 105, 123, 133, 141])
df = pd.DataFrame({'Values': b})
df['Range'] = pd.cut(df['Values'], a)
dummies = pd.get_dummies(df['Range'])
res = pd.concat([df, dummies], axis=1)
print(res)
解释
- 如果未提供任何标签,则使用与范围相关的默认标签
- 将序列扩展为“一个热编码”格式
- 允许您将原始数据帧加入
的输出get\u dummies
- 或者,您可以通过
将res=res.set_index('Values')
值设置为索引
print(res)
Values Range (0, 14] (14, 29] (29, 43] (43, 58] (58, 72] \
0 10 (0, 14] 1 0 0 0 0
1 26 (14, 29] 0 1 0 0 0
2 36 (29, 43] 0 0 1 0 0
3 48 (43, 58] 0 0 0 1 0
4 64 (58, 72] 0 0 0 0 1
5 71 (58, 72] 0 0 0 0 1
6 91 (86, 101] 0 0 0 0 0
7 105 (101, 115] 0 0 0 0 0
8 123 (115, 130] 0 0 0 0 0
9 133 (130, 144] 0 0 0 0 0
10 141 (130, 144] 0 0 0 0 0
(72, 86] (86, 101] (101, 115] (115, 130] (130, 144]
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 1 0 0 0
7 0 0 1 0 0
8 0 0 0 1 0
9 0 0 0 0 1
10 0 0 0 0 1
与一起使用,按b
数组最后添加索引:
labels = ['{}-{}'.format(i, j - 1) for i, j in zip(a[:-1].astype(int), a[1:].astype(int))]
d = pd.get_dummies((pd.cut(b, a, labels=labels))).set_index(b)
print (d)
0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 \
10 1 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0
36 0 0 1 0 0 0 0 0 0
48 0 0 0 1 0 0 0 0 0
64 0 0 0 0 1 0 0 0 0
71 0 0 0 0 1 0 0 0 0
91 0 0 0 0 0 0 1 0 0
105 0 0 0 0 0 0 0 1 0
123 0 0 0 0 0 0 0 0 1
133 0 0 0 0 0 0 0 0 0
141 0 0 0 0 0 0 0 0 0
130-143
10 0
26 0
36 0
48 0
64 0
71 0
91 0
105 0
123 0
133 1
141 1
如果希望最后一个标签更改为144
,以下是解决方案:
a1 = a[:-1].astype(int)
a2 = a[1:].astype(int)
a2[-1] += 1
labels = ['{}-{}'.format(i, j - 1) for i, j in zip(a1, a2)]
d = pd.get_dummies((pd.cut(b, a, labels=labels))).set_index(b)
print (d)
0-13 14-28 29-42 43-57 58-71 72-85 86-100 101-114 115-129 \
10 1 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0
36 0 0 1 0 0 0 0 0 0
48 0 0 0 1 0 0 0 0 0
64 0 0 0 0 1 0 0 0 0
71 0 0 0 0 1 0 0 0 0
91 0 0 0 0 0 0 1 0 0
105 0 0 0 0 0 0 0 1 0
123 0 0 0 0 0 0 0 0 1
133 0 0 0 0 0 0 0 0 0
141 0 0 0 0 0 0 0 0 0
130-144
10 0
26 0
36 0
48 0
64 0
71 0
91 0
105 0
123 0
133 1
141 1
我有两个错误:ValueError:无法将字符串转换为float:“Range”两次。@VasyaPravdin,我建议您在新的Python会话中复制粘贴上述代码,而不做任何更改。如果它有效,那么您如何使代码适应您的应用程序就有问题了。@VasyaPravdin-我的解决方案呢?相同的错误?@VasyaPravdin,请参阅更新
pd.concat应该适合您。我认为这是一个版本问题。已选中。@jpp更新版本正在运行!谢谢你的解释。