Python 创建引用其自身先前值的列的有效方法
我试图在一个数据框架中生成一些列,这些列使用datetime索引,该索引基于引用它们自己以前的值的规则。我已经尝试了df长度上的for循环,如下所示,但如果可能,是否寻找更清洁的解决方案 因为我最后想做的是在大量的a,B…上获取生成列(下面的例子中是C,D,E)的统计信息Python 创建引用其自身先前值的列的有效方法,python,pandas,Python,Pandas,我试图在一个数据框架中生成一些列,这些列使用datetime索引,该索引基于引用它们自己以前的值的规则。我已经尝试了df长度上的for循环,如下所示,但如果可能,是否寻找更清洁的解决方案 因为我最后想做的是在大量的a,B…上获取生成列(下面的例子中是C,D,E)的统计信息 import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(30, 2), columns=list('AB')) reset_level
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(30, 2), columns=list('AB'))
reset_level = 0.5
df['diff'] = df['A'].diff()
df['C'], df['D'], df['E'] = [0.0, 0.0, 0.0]
for i in range(1,len(df)):
if abs(df.iloc[i-1]['C'] + df.iloc[i]['diff']) > (reset_level):
df.iat[i,3] = 0.000
df.iat[i,4] = (df.iloc[i-1]['C'] + df.iloc[i]['diff'])
else:
df.iat[i,3] = (df.iloc[i-1]['C'] + df.iloc[i]['diff'])
df.iat[i,4] = 0.000
df.iat[i,5] = 0.5 * df.iloc[i]['D'] * df.iloc[i]['D']
编辑:在下面添加预期输出
A B diff C D E
0 -0.352725 1.429037 NaN 0.000000 0.000000 0.000000
1 -1.024418 -0.644302 -0.671693 0.000000 -0.671693 0.225585
2 0.401065 0.419555 1.425483 0.000000 1.425483 1.016001
3 -1.302484 0.724320 -1.703549 0.000000 -1.703549 1.451039
4 0.427035 0.835221 1.729518 0.000000 1.729518 1.495617
5 0.158694 -0.416741 -0.268340 -0.268340 0.000000 0.000000
6 0.921985 -0.490635 0.763291 0.494951 0.000000 0.000000
7 -0.835297 -1.036580 -1.757282 0.000000 -1.262331 0.796740
8 0.752060 -0.279206 1.587356 0.000000 1.587356 1.259850
9 1.795306 -1.554886 1.043246 0.000000 1.043246 0.544181
10 -0.405100 -0.361454 -2.200406 0.000000 -2.200406 2.420893
11 -0.253629 -0.627245 0.151471 0.151471 0.000000 0.000000
12 -0.820573 -0.212886 -0.566944 -0.415473 0.000000 0.000000
13 0.473439 2.532487 1.294012 0.000000 0.878539 0.385916
14 -1.395435 1.016338 -1.868875 0.000000 -1.868875 1.746346
15 -0.244269 -0.337820 1.151166 0.000000 1.151166 0.662592
16 -2.084977 -1.262249 -1.840708 0.000000 -1.840708 1.694103
17 0.666323 -1.696245 2.751300 0.000000 2.751300 3.784825
18 0.235207 -0.513903 -0.431115 -0.431115 0.000000 0.000000
19 1.386456 -0.149153 1.151249 0.000000 0.720134 0.259296
20 0.093456 -0.298154 -1.293000 0.000000 -1.293000 0.835925
21 0.690499 -1.687416 0.597043 0.000000 0.597043 0.178230
22 1.287530 -1.390260 0.597031 0.000000 0.597031 0.178223
23 1.828138 -0.288829 0.540608 0.000000 0.540608 0.146128
24 0.209666 -0.903385 -1.618472 0.000000 -1.618472 1.309727
25 -1.010678 0.615569 -1.220344 0.000000 -1.220344 0.744619
26 -1.799800 1.536332 -0.789122 0.000000 -0.789122 0.311357
27 0.611096 -1.033066 2.410896 0.000000 2.410896 2.906209
28 -0.532675 -0.091541 -1.143770 0.000000 -1.143770 0.654105
29 2.468137 -1.046117 3.000811 0.000000 3.000811 4.502435
试试这个(但不要迭代所有行-它会一次完成整个列):
df[“C_prev”]=df[“C”].shift(1)
试试这个(但不要迭代所有行-它会为您一次完成整个列):
df[“C_prev”]=df[“C”].shift(1)
我使用numpy数组将转换为
循环以保存条件,然后使用np。其中
根据您的条件替换值:
A B diff C D E
0 -0.432513 -0.259526 NaN NaN 0.000000 0.000000
1 -1.120872 -1.572850 -0.688360 0.000000 NaN NaN
2 -0.917555 -2.251316 0.203317 0.203317 0.000000 0.000000
3 -1.869781 -1.284524 -0.952225 0.000000 -0.748908 0.280432
4 -2.041950 -0.091837 -0.172169 -0.172169 0.000000 0.000000
5 -0.142499 0.207746 1.899451 0.000000 1.727282 1.491751
6 1.432833 0.085211 1.575332 0.000000 1.575332 1.240835
7 -2.500191 -0.009907 -3.933025 0.000000 -3.933025 7.734341
8 0.154460 -1.859954 2.654651 0.000000 2.654651 3.523587
9 -0.565057 -0.516736 -0.719517 0.000000 -0.719517 0.258853
10 0.329845 0.127978 0.894902 0.000000 0.894902 0.400425
11 -0.920558 1.254617 -1.250402 0.000000 -1.250402 0.781753
12 -1.396913 0.262378 -0.476355 -0.476355 0.000000 0.000000
13 0.117336 -0.439932 1.514249 0.000000 1.037894 0.538612
14 -0.227066 2.565831 -0.344402 -0.344402 0.000000 0.000000
15 0.077750 0.195277 0.304816 0.304816 0.000000 0.000000
16 1.470611 -0.357213 1.392861 0.000000 1.697677 1.441053
17 -0.553844 0.339270 -2.024455 0.000000 -2.024455 2.049209
18 -0.259603 0.212839 0.294242 0.294242 0.000000 0.000000
19 0.605961 0.279599 0.865564 0.000000 1.159805 0.672574
20 -0.326706 -0.774350 -0.932667 0.000000 -0.932667 0.434934
21 -0.927601 -2.360751 -0.600895 0.000000 -0.600895 0.180537
22 -0.372085 0.986228 0.555516 0.000000 0.555516 0.154299
23 -0.687731 -2.966817 -0.315647 -0.315647 0.000000 0.000000
24 -0.041028 -0.328898 0.646703 0.000000 0.331057 0.054799
25 0.099489 0.275983 0.140517 0.140517 0.000000 0.000000
26 0.468274 -0.287097 0.368785 0.368785 0.000000 0.000000
27 0.497417 -0.588481 0.029143 0.029143 0.000000 0.000000
28 0.603178 2.243163 0.105761 0.105761 0.000000 0.000000
29 -0.643283 -1.051491 -1.246461 0.000000 -1.140700 0.650598
这就是你想要的,你没有提供预期的输出
文件:
我使用numpy数组将
转换为
循环以保存条件,然后使用np。其中
根据条件替换值:
A B diff C D E
0 -0.432513 -0.259526 NaN NaN 0.000000 0.000000
1 -1.120872 -1.572850 -0.688360 0.000000 NaN NaN
2 -0.917555 -2.251316 0.203317 0.203317 0.000000 0.000000
3 -1.869781 -1.284524 -0.952225 0.000000 -0.748908 0.280432
4 -2.041950 -0.091837 -0.172169 -0.172169 0.000000 0.000000
5 -0.142499 0.207746 1.899451 0.000000 1.727282 1.491751
6 1.432833 0.085211 1.575332 0.000000 1.575332 1.240835
7 -2.500191 -0.009907 -3.933025 0.000000 -3.933025 7.734341
8 0.154460 -1.859954 2.654651 0.000000 2.654651 3.523587
9 -0.565057 -0.516736 -0.719517 0.000000 -0.719517 0.258853
10 0.329845 0.127978 0.894902 0.000000 0.894902 0.400425
11 -0.920558 1.254617 -1.250402 0.000000 -1.250402 0.781753
12 -1.396913 0.262378 -0.476355 -0.476355 0.000000 0.000000
13 0.117336 -0.439932 1.514249 0.000000 1.037894 0.538612
14 -0.227066 2.565831 -0.344402 -0.344402 0.000000 0.000000
15 0.077750 0.195277 0.304816 0.304816 0.000000 0.000000
16 1.470611 -0.357213 1.392861 0.000000 1.697677 1.441053
17 -0.553844 0.339270 -2.024455 0.000000 -2.024455 2.049209
18 -0.259603 0.212839 0.294242 0.294242 0.000000 0.000000
19 0.605961 0.279599 0.865564 0.000000 1.159805 0.672574
20 -0.326706 -0.774350 -0.932667 0.000000 -0.932667 0.434934
21 -0.927601 -2.360751 -0.600895 0.000000 -0.600895 0.180537
22 -0.372085 0.986228 0.555516 0.000000 0.555516 0.154299
23 -0.687731 -2.966817 -0.315647 -0.315647 0.000000 0.000000
24 -0.041028 -0.328898 0.646703 0.000000 0.331057 0.054799
25 0.099489 0.275983 0.140517 0.140517 0.000000 0.000000
26 0.468274 -0.287097 0.368785 0.368785 0.000000 0.000000
27 0.497417 -0.588481 0.029143 0.029143 0.000000 0.000000
28 0.603178 2.243163 0.105761 0.105761 0.000000 0.000000
29 -0.643283 -1.051491 -1.246461 0.000000 -1.140700 0.650598
这就是你想要的,你没有提供预期的输出
文件:
def f(row):
if abs(df.loc[row.name - 1, 'C'] + row['diff']) > reset_level:
C = 0.0
D = df.loc[row.name - 1, 'C'] + row['diff']
else:
C = df.loc[row.name - 1, 'C'] + row['diff']
D = 0.0
E = 0.5 * row['D'] * row['D']
return(pd.Series([C, D, E]))
df.loc[1:, ['C', 'D', 'E']] = df[1:].apply(f, axis=1)
然后对每行应用一个函数:
def f(row):
if abs(df.loc[row.name - 1, 'C'] + row['diff']) > reset_level:
C = 0.0
D = df.loc[row.name - 1, 'C'] + row['diff']
else:
C = df.loc[row.name - 1, 'C'] + row['diff']
D = 0.0
E = 0.5 * row['D'] * row['D']
return(pd.Series([C, D, E]))
df.loc[1:, ['C', 'D', 'E']] = df[1:].apply(f, axis=1)
提供预期输出能否为numpy设置一个
种子
,否则,由于数据的随机性,无法提供相同的输出并查看出了什么问题。提供预期输出能否为numpy设置一个种子
,否则,由于数据的随机性,“C”中的值在创建时都初始化为0,因此不可能提供相同的输出并查看出了什么问题history@VolGuy接得好,我已经忘了。请参阅我的更新答案(@SmileyProd's也非常优雅)。谢谢,但问题是当您创建时,“C”中的值都初始化为0history@VolGuy接得好,我已经忘了。看看我的最新答案(@SmileyProd的也很优雅)。
def f(row):
if abs(df.loc[row.name - 1, 'C'] + row['diff']) > reset_level:
C = 0.0
D = df.loc[row.name - 1, 'C'] + row['diff']
else:
C = df.loc[row.name - 1, 'C'] + row['diff']
D = 0.0
E = 0.5 * row['D'] * row['D']
return(pd.Series([C, D, E]))
df.loc[1:, ['C', 'D', 'E']] = df[1:].apply(f, axis=1)