Python 如何在级联列中添加第一个值?
有人能帮我找到比循环更好的解决方案吗?假设下面的熊猫数据框由4列组成。我正在寻找一种方法,通过循环以外的其他方法获得与“Result”列中相同的值 逻辑如下:Python 如何在级联列中添加第一个值?,python,pandas,numpy,Python,Pandas,Numpy,有人能帮我找到比循环更好的解决方案吗?假设下面的熊猫数据框由4列组成。我正在寻找一种方法,通过循环以外的其他方法获得与“Result”列中相同的值 逻辑如下: 如果优先级1=1,则结果为1 如果优先级1=1且优先级2=1,则结果=2(如果优先级1!=1,则忽略所有其他) 如果priority1=1,priority2=1,priority3=1,则结果=3(如果priority1和priority2!=1,则忽略所有其他结果) 如果priority1=1,priority2=1,priorit
- 如果优先级1=1,则结果为1
- 如果优先级1=1且优先级2=1,则结果=2(如果优先级1!=1,则忽略所有其他)
- 如果priority1=1,priority2=1,priority3=1,则结果=3(如果priority1和priority2!=1,则忽略所有其他结果)
- 如果priority1=1,priority2=1,priority3=1,priority4=1,则结果=4(如果priority1,priority2,priority3!=1,则忽略所有其他结果)
+-----+-----------+-----------+-----------+-----------+--------+
| | priority1 | priority2 | priority3 | priority4 | Result |
+-----+-----------+-----------+-----------+-----------+--------+
| 0 | | | | | |
| 1 | | 1 | -1 | -1 | |
| 2 | | | | | |
| 3 | | | | 1 | |
| 4 | | | | 1 | |
| 5 | | | | | |
| 6 | | | | -1 | |
| 7 | | | | | |
| 8 | | | | | |
| 9 | 1 | 1 | 1 | 1 | 1 |
| 10 | | | | | |
| 11 | | | | 1 | |
| 12 | | | 1 | | |
| 13 | | | | | |
| 14 | | | -1 | -1 | |
| 15 | | | | | |
| 16 | | | | | |
| 17 | | | | -1 | |
| 18 | | | | | |
| 19 | | | | | |
| 20 | | 1 | 1 | 1 | 2 |
| 21 | | | | | |
| 22 | | | -1 | -1 | |
| 23 | | | | | |
| 24 | | | | | |
| 25 | | | | -1 | |
| 26 | | | | | |
| 27 | | | 1 | 1 | 3 |
| 28 | | | | | |
| 29 | | | | | |
| 30 | | | | -1 | |
| 31 | | | | | |
| 32 | | | | | |
| 33 | | | -1 | -1 | |
| 34 | | | | | |
| 35 | | | 1 | 1 | 4 |
| 36 | | | | | |
| 37 | | | | | |
| 38 | | | | | |
| 39 | | | -1 | -1 | |
| 40 | | | | | |
| 41 | | | | | |
| 42 | | 1 | 1 | 1 | 2 |
| 43 | | | | | |
| 44 | | | | | |
| 45 | | | | -1 | |
| 46 | | | | | |
| 47 | | | | | |
| 48 | | | | | |
| 49 | | | | | |
| 50 | | -1 | -1 | -1 | |
| 51 | | | | | |
| 52 | | | | | |
| 53 | | 1 | 1 | 1 | 2 |
| 54 | | | | | |
| 55 | | | | | |
| 56 | | | | -1 | |
| 57 | | | | | |
| 58 | | | | | |
| 59 | | | -1 | -1 | |
| 60 | | | | | |
| 61 | | | | | |
| 62 | | | 1 | 1 | 3 |
| 63 | | | | | |
| 64 | -1 | -1 | -1 | -1 | -1 |
| 65 | | | | | |
| 66 | | | 1 | 1 | |
| 67 | | | | | |
| 68 | | | | | |
| 69 | | | | | |
| 70 | | | | -1 | |
| 71 | | | | | |
| 72 | | | | | |
| 73 | | | 1 | 1 | |
| 74 | | | | -1 | |
| 75 | | | | | |
| 76 | | | 1 | 1 | |
| 77 | | | | | |
| 78 | | -1 | -1 | -1 | -2 |
| 79 | | | 1 | | |
| 80 | | | 1 | 1 | |
| 81 | | | | | |
| 82 | | | -1 | -1 | -3 |
| 83 | | | 1 | 1 | |
| 84 | | | | | |
| 85 | | | | | |
| 86 | | | | | |
| 87 | | | -1 | -1 | -4 |
| 88 | | | | | |
| 89 | | -1 | -1 | -1 | -2 |
| 90 | | | | | |
| 91 | | | | | |
| 92 | | | | -1 | |
| 93 | | | | | |
| 94 | | | | | |
| 95 | | | 1 | 1 | |
| 96 | | | | | |
| 97 | | | | | |
| 98 | | | | -1 | |
| 99 | | | | 1 | |
| 100 | | | | | |
| 101 | | | -1 | -1 | -3 |
| 102 | | | | | |
| 103 | | | | | |
| 104 | | 1 | 1 | 1 | |
| 105 | | | | | |
| 106 | | | | 1 | |
| 107 | | | | | |
| 108 | | | -1 | -1 | |
| 109 | | | | | |
| 110 | | | | | |
| 111 | | | 1 | 1 | |
| 112 | | | | | |
| 113 | | | | | |
| 114 | | | -1 | -1 | |
| 115 | | | | | |
| 116 | | | 1 | 1 | |
| 117 | | | | | |
| 118 | | | | | |
| 119 | | -1 | -1 | -1 | -2 |
| 120 | | | | | |
| 121 | | | | | |
| 122 | | | | 1 | |
| 123 | | | | | |
| 124 | | | | 1 | |
| 125 | | | | | |
| 126 | | | | | |
| 127 | | | 1 | 1 | |
| 128 | | | | | |
| 129 | | | | | |
| 130 | | | -1 | -1 | -3 |
| 131 | | | | | |
| 132 | | | | | |
| 133 | | | | | |
| 134 | 1 | 1 | 1 | 1 | 1 |
| 135 | | | | -1 | |
| 136 | | | | | |
| 137 | | -1 | -1 | -1 | |
| 138 | | | 1 | | |
| 139 | | | | 1 | |
| 140 | | 1 | 1 | 1 | 2 |
| 141 | | | 1 | 1 | 3 |
| 142 | | | | | |
| 143 | | | | -1 | |
| 144 | | | | | |
| 145 | | | | 1 | 4 |
+-----+-----------+-----------+-----------+-----------+--------+
设置
df = pd.DataFrame([
[ 1, 0, 0, 0],
[ 1, 1, 0, 0],
[ 1, 1, 1, 0],
[ 1, 1, 1, 1],
[ 0, 1, 1, 1],
[ 0, 0, 1, 1],
[ 0, 0, 0, 1],
[ 1, 0, 1, 1], # this should end up 1
[ 0, 0, 0, 0],
[-1, 0, 0, 0],
[-1, -1, 0, 0],
[-1, -1, -1, 0],
[-1, -1, -1, -1],
[ 0, -1, -1, -1],
[ 0, 0, -1, -1],
[ 0, 0, 0, -1],
], columns=['priority{}'.format(i) for i in range(1, 5)])
v = df.values
df.assign(Results=(v * v.cumprod(1).astype(np.bool8)).sum(1))
priority1 priority2 priority3 priority4 Results
0 1 0 0 0 1
1 1 1 0 0 2
2 1 1 1 0 3
3 1 1 1 1 4
4 0 1 1 1 0
5 0 0 1 1 0
6 0 0 0 1 0
7 1 0 1 1 1
8 0 0 0 0 0
9 -1 0 0 0 -1
10 -1 -1 0 0 -2
11 -1 -1 -1 0 -3
12 -1 -1 -1 -1 -4
13 0 -1 -1 -1 0
14 0 0 -1 -1 0
15 0 0 0 -1 0
解决方案
df = pd.DataFrame([
[ 1, 0, 0, 0],
[ 1, 1, 0, 0],
[ 1, 1, 1, 0],
[ 1, 1, 1, 1],
[ 0, 1, 1, 1],
[ 0, 0, 1, 1],
[ 0, 0, 0, 1],
[ 1, 0, 1, 1], # this should end up 1
[ 0, 0, 0, 0],
[-1, 0, 0, 0],
[-1, -1, 0, 0],
[-1, -1, -1, 0],
[-1, -1, -1, -1],
[ 0, -1, -1, -1],
[ 0, 0, -1, -1],
[ 0, 0, 0, -1],
], columns=['priority{}'.format(i) for i in range(1, 5)])
v = df.values
df.assign(Results=(v * v.cumprod(1).astype(np.bool8)).sum(1))
priority1 priority2 priority3 priority4 Results
0 1 0 0 0 1
1 1 1 0 0 2
2 1 1 1 0 3
3 1 1 1 1 4
4 0 1 1 1 0
5 0 0 1 1 0
6 0 0 0 1 0
7 1 0 1 1 1
8 0 0 0 0 0
9 -1 0 0 0 -1
10 -1 -1 0 0 -2
11 -1 -1 -1 0 -3
12 -1 -1 -1 -1 -4
13 0 -1 -1 -1 0
14 0 0 -1 -1 0
15 0 0 0 -1 0
它的工作原理
df = pd.DataFrame([
[ 1, 0, 0, 0],
[ 1, 1, 0, 0],
[ 1, 1, 1, 0],
[ 1, 1, 1, 1],
[ 0, 1, 1, 1],
[ 0, 0, 1, 1],
[ 0, 0, 0, 1],
[ 1, 0, 1, 1], # this should end up 1
[ 0, 0, 0, 0],
[-1, 0, 0, 0],
[-1, -1, 0, 0],
[-1, -1, -1, 0],
[-1, -1, -1, -1],
[ 0, -1, -1, -1],
[ 0, 0, -1, -1],
[ 0, 0, 0, -1],
], columns=['priority{}'.format(i) for i in range(1, 5)])
v = df.values
df.assign(Results=(v * v.cumprod(1).astype(np.bool8)).sum(1))
priority1 priority2 priority3 priority4 Results
0 1 0 0 0 1
1 1 1 0 0 2
2 1 1 1 0 3
3 1 1 1 1 4
4 0 1 1 1 0
5 0 0 1 1 0
6 0 0 0 1 0
7 1 0 1 1 1
8 0 0 0 0 0
9 -1 0 0 0 -1
10 -1 -1 0 0 -2
11 -1 -1 -1 0 -3
12 -1 -1 -1 -1 -4
13 0 -1 -1 -1 0
14 0 0 -1 -1 0
15 0 0 0 -1 0
用鼠标抓住numpy
数组
v = df.values
非零为True
带
v.astype(np.bool8)
每个连续的列继续为非零
v.astype(np.bool8).cumprod(1)
乘以v
过滤出要相加的数值,然后求和
(v * v.astype(np.bool8).cumprod(1)).sum()
原始时间测试小数据 大数据
使用piRSquared的示例框架(帽尖!),我可能会做如下操作
match = (df.abs() == 1) & (df.eq(df.iloc[:, 0], axis=0))
out = match.cumprod(axis=1).sum(axis=1) * df.iloc[:, 0]
这让我
In [107]: df["Result"] = out
In [108]: df
Out[108]:
priority1 priority2 priority3 priority4 Result
0 1 0 0 0 1
1 1 1 0 0 2
2 1 1 1 0 3
3 1 1 1 1 4
4 0 1 1 1 0
5 0 0 1 1 0
6 0 0 0 1 0
7 0 0 0 0 0
8 -1 0 0 0 -1
9 -1 -1 0 0 -2
10 -1 -1 -1 0 -3
11 -1 -1 -1 -1 -4
12 0 -1 -1 -1 0
13 0 0 -1 -1 0
14 0 0 0 -1 0
你能再解释一下逻辑吗?如果所有优先级都等于1,结果会是4吗?如果我看到你的上一篇专栏文章,结果是4,而只有优先级4==1,为了计时的目的,我冒昧地把它归结为这个。。。回答得很好<代码>v=df.值;df.assign(Results=((v==v[:,[0]])和v.cumprod(1).sum(1)*v[:,0])感谢您的回答和对其工作原理的指导:)