Python 数据帧中连续的负数_Python_Function_Dataframe

Python 数据帧中连续的负数

python function dataframe

Python 数据帧中连续的负数,python,function,dataframe,Python,Function,Dataframe,这是我的数据集示例： fvc pef fev1 fev1_fvc fev6 fev25_75 fvc_changes Date Time 2017-03-14 19:27:14 2.7 3.7 1.7 0.63 1.8 0.9 0.00 2017-03-15 11:35:21 3.1 2.

这是我的数据集示例：

                        fvc  pef  fev1  fev1_fvc  fev6  fev25_75  fvc_changes
Date        Time                                        
2017-03-14  19:27:14    2.7  3.7  1.7   0.63      1.8   0.9         0.00
2017-03-15  11:35:21    3.1  2.8  2.0   0.65      2.2   1.2        14.81
2017-03-16  15:37:02    2.8  2.6  1.8   0.62      1.9   1.0         3.70
2017-03-17  17:11:16    2.8  3.1  1.9   0.66      2.0   1.2         3.70
2017-03-18  20:29:35    2.9  3.4  1.8   0.64      2.0   1.0         7.41
2017-03-19  21:53:09    2.2  4.1  1.5   0.65      2.2   0.8       -18.52
            21:54:23    2.4  4.1  1.7   0.71      1.8   1.2       -11.11
2017-03-20  14:36:24    2.3  4.1  1.6   0.69      1.7   1.0       -14.81
2017-03-21  22:36:43    2.1  4.0  1.4   0.63      1.4   0.8       -22.22

这是我为进入这个阶段而编写的函数

def fvc_changes(df, fvc_base=2.7):
    # for loop to calculate fvc changes from baseline
    for fvc in df:
        changes = ((df['fvc'] - fvc_base) / fvc_base) * 100
        changes = round(changes, 2)

    # add result into new column: fvc_changes
    df['fvc_changes'] = changes
    return

我希望通过以下方式扩展此功能：

它将通过fvc_changes列（从开始到结束），并检查其值是否小于-10

如果连续遇到第三个负值（小于-10），则将在附加到同一数据帧的新列中打印“恶化”

该函数将仅评估任何给定日期fvc_变化的最终值，即如果一个日期有两个fvc_变化，它将仅评估fvc_变化的第二个值

最终的数据帧应如下所示：

                        fvc  pef  fev1  fev1_fvc  fev6  fev25_75  fvc_changes  exacerbation
Date        Time                                        
2017-03-14  19:27:14    2.7  3.7  1.7   0.63      1.8   0.9         0.00 
2017-03-15  11:35:21    3.1  2.8  2.0   0.65      2.2   1.2        14.81
2017-03-16  15:37:02    2.8  2.6  1.8   0.62      1.9   1.0        -3.70
2017-03-17  17:11:16    2.8  3.1  1.9   0.66      2.0   1.2         3.70
2017-03-18  20:29:35    2.9  3.4  1.8   0.64      2.0   1.0         7.41
2017-03-19  21:53:09    2.2  4.1  1.5   0.65      2.2   0.8       -18.52
            21:54:23    2.4  4.1  1.7   0.71      1.8   1.2       -11.11
2017-03-20  14:36:24    2.3  4.1  1.6   0.69      1.7   1.0       -14.81
2017-03-21  22:36:43    2.1  4.0  1.4   0.63      1.4   0.8       -22.22        EXACERBATION

我认为您可以通过几个步骤来实现这一点，尽管可能有更聪明的方法

import pandas as pd
import numpy as np

df['exacerbation'] = df.groupby(level=0).fvc_changes.transform(lambda x: x.tail(1) <-10)
df['exacerbation'] = (df.groupby(df.exacerbation.astype('int').diff().abs().cumsum()).exacerbation
                        .apply(lambda x: x.cumsum() > 3))
df['exacerbation'] = df['exacerbation'].replace(np.NaN, False)

df['exacerbation'] = np.where(df.exacerbation, 'EXACERBATION', '')

编辑：我认为上面的逻辑可能并不完全正确。这里有一个稍微不同的方法，它应该是有效的。上面将同一“天”的多个值视为一条条纹。此方法仅计算条纹中一天的最后一个值。您可以在输出中看到，尽管最后4行的值为负值，但它们只跨2天，因此不计算

import pandas as pd
df['exacerbation'] = df.groupby(level=0).fvc_changes.transform(lambda x: x.tail(1) < -10 )
df2 = df.reset_index().drop_duplicates('Date', keep='last')
df2['exacerbation'] = (df2.groupby(df2.exacerbation.astype('int').diff().abs().cumsum()).exacerbation
                          .apply(lambda x: x.cumsum() >= 3))

df2['exacerbation'] = df2['exacerbation'].replace(np.NaN, False)
df = df.merge(df2[['Date', 'Time', 'exacerbation']], left_index=True, right_on=['Date', 'Time'], how='left',
              suffixes=['_', '']).drop(columns='exacerbation_').set_index(['Date', 'Time']).fillna(method='bfill')

df['exacerbation'] = np.where(df.exacerbation, 'EXACERBATION', '')

lambda的用法很好：）@Mahf_i，实际上我认为逻辑上可能有一个小错误。我会试着看看能不能修好它。

import pandas as pd
df['exacerbation'] = df.groupby(level=0).fvc_changes.transform(lambda x: x.tail(1) < -10 )
df2 = df.reset_index().drop_duplicates('Date', keep='last')
df2['exacerbation'] = (df2.groupby(df2.exacerbation.astype('int').diff().abs().cumsum()).exacerbation
                          .apply(lambda x: x.cumsum() >= 3))

df2['exacerbation'] = df2['exacerbation'].replace(np.NaN, False)
df = df.merge(df2[['Date', 'Time', 'exacerbation']], left_index=True, right_on=['Date', 'Time'], how='left',
              suffixes=['_', '']).drop(columns='exacerbation_').set_index(['Date', 'Time']).fillna(method='bfill')

df['exacerbation'] = np.where(df.exacerbation, 'EXACERBATION', '')

                     fvc_changes  exacerbation
Date       Time                               
2017-03-14 19:27:14         0.00              
2017-03-15 11:35:21        14.81              
2017-03-16 15:37:02         3.70              
2017-03-17 17:11:16         3.70              
2017-03-18 20:29:35         7.41              
2017-03-19 20:53:09       -12.52              
           21:53:09       -18.52              
           21:54:23       -11.11              
2017-03-20 14:36:24       -14.81              
2017-03-21 22:36:43       -22.22  EXACERBATION
2017-03-24 17:11:16         3.70              
2017-03-25 20:29:35         7.41              
2017-03-26 21:53:09       -18.52              
2017-03-27 21:54:23       -11.11              
2017-03-28 14:36:24       -14.81  EXACERBATION
2017-03-29 22:36:43       -22.22  EXACERBATION
2017-03-30 22:36:43        22.22              
2017-04-02 20:53:09       -12.52              
           21:53:09       -18.52              
           21:54:23       -11.11              
2017-04-03 14:36:24       -14.81