Python 通过放置NaN来处理索引越界

Python 通过放置NaN来处理索引越界,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有这个数据帧: df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], 'col2': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], 'col3': [1,0,1,0,0,-1,1,-1,-1,1,0,1,1,1,1]}) 我想运行一个循环来检查“col3”的每一行中是否有1,如果选中了1,则使用下一行的输

假设我有这个数据帧:

df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
                   'col2': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
                   'col3': [1,0,1,0,0,-1,1,-1,-1,1,0,1,1,1,1]})
我想运行一个循环来检查“col3”的每一行中是否有1,如果选中了1,则使用下一行的输入进行一些计算,同时将结果降低一行(我不知道如何在追加时进行索引移位,因此我将最终结果移位)

代码如下:

balance = []
cum_sum = 0
profits = []
hit = 0

for i in range(len(df)):
    if df['col3'][i] == 1:
        cum_sum += (df['col1'][i+1] + (df['col2'][i+1]))
        balance.append(cum_sum)
    else:
        balance.append(None)

    if df['col3'][i] == 1:
        transactions = df['col1'][i+1] + df['col2'][i+1]
        profits.append(transactions)
    else:
        profits.append(None)
    
df['profits'] = profits
df['profits'] = df['profits'].shift(1)
df['balance'] = balance
df['balance'] = df['balance'].shift(1)

现在的问题是,当列“col3”的最后一个元素为1时,代码将尝试访问不存在的索引,以便输入进行计算,这将导致索引越界错误

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-53-a4361f78dd64> in <module>
      6 for i in range(len(df)):
      7     if df['col3'][i] == 1:
----> 8         cum_sum += (df['col1'][i+1] + (df['col2'][i+1]))
      9         balance.append(cum_sum)
     10     else:

~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4403         k = self._convert_scalar_indexer(k, kind="getitem")
   4404         try:
-> 4405             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4406         except KeyError as e1:
   4407             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 15

然后,我得到一个错误,即在尝试将附加值放回数据帧时,值的长度与索引的长度不匹配。(原始数据帧为日期索引)


通过
shift
检查col3并分配值之和,然后使用
cumsum

df["profits"] = df.loc[df["col3"].shift().eq(1),["col1","col2"]].sum(1)
df["balance"] = df["profits"].cumsum()
print (df)

    col1  col2  col3  profits  balance
0      1     1     1      NaN      NaN
1      2     2     0      4.0      4.0
2      3     3     1      NaN      NaN
3      4     4     0      8.0     12.0
4      5     5     0      NaN      NaN
5      6     6    -1      NaN      NaN
6      7     7     1      NaN      NaN
7      8     8    -1     16.0     28.0
8      9     9    -1      NaN      NaN
9     10    10     1      NaN      NaN
10    11    11     0     22.0     50.0
11    12    12     1      NaN      NaN
12    13    13     1     26.0     76.0
13    14    14     1     28.0    104.0
14    15    15     1     30.0    134.0

您的预期产出是什么?理想情况下,NaN在“利润价格”和“余额价格”中代替越界值请提供a,以及当前和预期产出。谢谢您的回答。我还没有试过,但是如果我想做减法(或其他运算)而不是求和呢?使用,.Hi,求和非常有效,但是当我尝试减法时,我希望col2在选中1时减去col1:df[“利润”]=df.loc[df[“col3”].shift().eq(1),[“col1”].sub(“col2”)这不起作用使用
df.loc[df[“col3”].shift().eq(1),“col1”].sub(df.loc[df[“col3”].shift().eq(1),“col2”])
。谢谢!终于成功了!不过感觉有点糟糕,当我写了将近20行的代码减少到只有2行时:(.同样,thx很多!
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-289cc6ecc295> in <module>
     17         profits.append(None)
     18 
---> 19 df['profits'] = profits
     20 df['profits'] = df['profits'].shift(1)
     21 df['balance'] = balance

~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   2936         else:
   2937             # set column
-> 2938             self._set_item(key, value)
   2939 
   2940     def _setitem_slice(self, key, value):

~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   2998 
   2999         self._ensure_valid_index(value)
-> 3000         value = self._sanitize_column(key, value)
   3001         NDFrame._set_item(self, key, value)
   3002 

~\anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3634 
   3635             # turn me into an ndarray
-> 3636             value = sanitize_index(value, self.index, copy=False)
   3637             if not isinstance(value, (np.ndarray, Index)):
   3638                 if isinstance(value, list) and len(value) > 0:

~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index, copy)
    609 
    610     if len(data) != len(index):
--> 611         raise ValueError("Length of values does not match length of index")
    612 
    613     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index
df2 = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,''],
                   'col2': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,''],
                   'col3': [1,0,1,0,0,-1,1,-1,-1,1,0,1,1,1,1,np.nan],
                   'profits': [np.nan, 4, np.nan, 8, np.nan, np.nan, np.nan, 16, np.nan, np.nan, 22, np.nan, 26, 28, 30, np.nan],
                   'balance': [np.nan, 4, np.nan, 12, np.nan, np.nan, np.nan, 28, np.nan, np.nan, 50, np.nan, 76, 104, 134, np.nan]})
df["profits"] = df.loc[df["col3"].shift().eq(1),["col1","col2"]].sum(1)
df["balance"] = df["profits"].cumsum()
print (df)

    col1  col2  col3  profits  balance
0      1     1     1      NaN      NaN
1      2     2     0      4.0      4.0
2      3     3     1      NaN      NaN
3      4     4     0      8.0     12.0
4      5     5     0      NaN      NaN
5      6     6    -1      NaN      NaN
6      7     7     1      NaN      NaN
7      8     8    -1     16.0     28.0
8      9     9    -1      NaN      NaN
9     10    10     1      NaN      NaN
10    11    11     0     22.0     50.0
11    12    12     1      NaN      NaN
12    13    13     1     26.0     76.0
13    14    14     1     28.0    104.0
14    15    15     1     30.0    134.0