Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/318.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用np.where定义多个条件句_Python_Pandas_Numpy_Dataframe - Fatal编程技术网

Python 使用np.where定义多个条件句

Python 使用np.where定义多个条件句,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我试图将几个相对简单的条件组合成一个np.where子句,但是我很难理解逻辑的语法 我当前的数据帧看起来像下面的df,有四列。我想添加两列,命名如下,条件如下: 所需输出低于-dfdf_so_v2 活动后天数 *查找具有相同ID的上一行的最近一行,然后减去日期列 *如果没有最新值,则返回NA Chg。平均值 条件1:如果计数为0,则不适用 条件2:如果计数=0,查找具有相同ID和计数的最近的上一行=0,然后在“平均值”列中查找差异 然而,我正在构建一个简单的np.where查询,如下所示,我不知

我试图将几个相对简单的条件组合成一个np.where子句,但是我很难理解逻辑的语法

我当前的数据帧看起来像下面的df,有四列。我想添加两列,命名如下,条件如下:

所需输出低于-dfdf_so_v2

  • 活动后天数 *查找具有相同ID的上一行的最近一行,然后减去日期列 *如果没有最新值,则返回NA

  • Chg。平均值 条件1:如果计数为0,则不适用 条件2:如果计数=0,查找具有相同ID和计数的最近的上一行=0,然后在“平均值”列中查找差异

  • 然而,我正在构建一个简单的np.where查询,如下所示,我不知道如何组合本例中所需的多个条件

    df['CASH'] = np.where(df['CASH'] != 0, df['CASH'] + commission , df['CASH'])
    
    非常感谢你在这方面的帮助

    df_dict={'DateOf': ['2017-08-07','2017-08-07','2017-08-07','2017-08-04','2017-08-04','2017-08-04'
                    , '2017-08-03','2017-08-03','2017-08-03','2017-08-02','2017-08-02','2017-08-02','2017-08-01','2017-08-01','2017-08-01'],
        'ID': ['553','559','914','553','559','914','553','559','914','553','559','914','553','559','914'], 'Count': [0, 4, 5, 0, 11, 10, 3, 9, 0,1,0,2,4,4,0],
        'Avg. Value': [0,3.5,2.2,0,4.2,3.3,5.3,5,0,3,0,2,4.4,6.4,0]}
    df_so=pd.DataFrame(df_dict)
    
    df_dict_v2={'DateOf': ['2017-08-07','2017-08-07','2017-08-07','2017-08-04','2017-08-04','2017-08-04'
                    , '2017-08-03','2017-08-03','2017-08-03','2017-08-02','2017-08-02','2017-08-02','2017-08-01','2017-08-01','2017-08-01'],
        'ID': ['553','559','914','553','559','914','553','559','914','553','559','914','553','559','914'], 'Count': [0, 4, 5, 0, 11, 10, 3, 9, 0,1,0,2,4,4,0],
        'Avg. Value': [0,3.5,2.2,0,4.2,3.3,5.3,5,0,3,0,2,4.4,6.4,0],
        'Days_since_activity': [4,3,1,1,1,2,1,2,1,1,1,1,'NA','NA','NA'],
        'Chg. Avg Value': ['NA',-0.7,-1.1,'NA',-0.8,1.3,2.3,-1.4,'NA',-1.4,'NA','NA','NA','NA','NA']
        }
    
    df_so_v2=pd.DataFrame(df_dict_v2)
    

    这是问题这一部分的答案。我需要更多关于2的条件的澄清

    1) 自活动开始的天数*查找具有相同ID的上一行的最近日期,然后减去日期列*如果没有最新值,则返回NA

    首先需要将字符串转换为datetime,然后按升序对日期进行排序。最后使用
    .transform
    查找差异

    df_dict={'DateOf': ['2017-08-07','2017-08-07','2017-08-07','2017-08-04','2017-08-04','2017-08-04'
                    , '2017-08-03','2017-08-03','2017-08-03','2017-08-02','2017-08-02','2017-08-02','2017-08-01','2017-08-01','2017-08-01'],
        'ID': ['553','559','914','553','559','914','553','559','914','553','559','914','553','559','914'], 'Count': [0, 4, 5, 0, 11, 10, 3, 9, 0,1,0,2,4,4,0],
        'Avg. Value': [0,3.5,2.2,0,4.2,3.3,5.3,5,0,3,0,2,4.4,6.4,0]}
    df_so = pd.DataFrame(df_dict)
    df_so['DateOf'] = pd.to_datetime(df_so['DateOf'])
    
    df_so.sort_values('DateOf', inplace=True)
    df_so['Days_since_activity'] = df_so.groupby(['ID'])['DateOf'].transform(pd.Series.diff)
    df_so.sort_index()
    
    df_dict={'DateOf': ['2017-08-07','2017-08-07','2017-08-07','2017-08-04','2017-08-04','2017-08-04'
                    , '2017-08-03','2017-08-03','2017-08-03','2017-08-02','2017-08-02','2017-08-02','2017-08-01','2017-08-01','2017-08-01'],
        'ID': ['553','559','914','553','559','914','553','559','914','553','559','914','553','559','914'], 'Count': [0, 4, 5, 0, 11, 10, 3, 9, 0,1,0,2,4,4,0],
        'Avg. Value': [0,3.5,2.2,0,4.2,3.3,5.3,5,0,3,0,2,4.4,6.4,0]}
    
    df = pd.DataFrame(df_dict)
    df['DateOf'] = pd.to_datetime(df['DateOf'], format='%Y-%m-%d')
    
    df.sort_values(['ID','DateOf'], inplace=True)
    df['Days_since_activity'] = df.groupby(['ID'])['DateOf'].diff()
    
    mask = df.ID != df.ID.shift(1)
    mask2 = df.groupby('ID').Count.shift(1) == 0
    
    df['Days_since_activity'][mask] = np.nan
    df['Days_since_activity'][mask2] = df.groupby(['ID'])['DateOf'].diff(2)
    
    df['Chg. Avg Value'] = df.groupby(['ID'])['Avg. Value'].diff()
    df['Chg. Avg Value'][mask2] = df.groupby(['ID'])['Avg. Value'].diff(2)
    
    conditions = [((df['Count'] == 0)),]
    choices = [np.nan,]
    df['Chg. Avg Value'] = np.select(conditions, choices, default = df['Chg. Avg Value'])
    
    # df = df.sort_index()
    df
    
    根据您的评论编辑: 查找计数不为零的最近前一天,并计算差值

    df_dict={'DateOf': ['2017-08-07','2017-08-07','2017-08-07','2017-08-04','2017-08-04','2017-08-04'
                    , '2017-08-03','2017-08-03','2017-08-03','2017-08-02','2017-08-02','2017-08-02','2017-08-01','2017-08-01','2017-08-01'],
        'ID': ['553','559','914','553','559','914','553','559','914','553','559','914','553','559','914'], 'Count': [0, 4, 5, 0, 11, 10, 3, 9, 0,1,0,2,4,4,0],
        'Avg. Value': [0,3.5,2.2,0,4.2,3.3,5.3,5,0,3,0,2,4.4,6.4,0]}
    df_so = pd.DataFrame(df_dict)
    df_so['DateOf'] = pd.to_datetime(df_so['DateOf'])
    
    df_so.sort_values('DateOf', inplace=True)
    df_so['Days_since_activity'] = df_so.groupby(['ID'])['DateOf'].transform(pd.Series.diff)
    df_so.sort_index()
    
    df_dict={'DateOf': ['2017-08-07','2017-08-07','2017-08-07','2017-08-04','2017-08-04','2017-08-04'
                    , '2017-08-03','2017-08-03','2017-08-03','2017-08-02','2017-08-02','2017-08-02','2017-08-01','2017-08-01','2017-08-01'],
        'ID': ['553','559','914','553','559','914','553','559','914','553','559','914','553','559','914'], 'Count': [0, 4, 5, 0, 11, 10, 3, 9, 0,1,0,2,4,4,0],
        'Avg. Value': [0,3.5,2.2,0,4.2,3.3,5.3,5,0,3,0,2,4.4,6.4,0]}
    
    df = pd.DataFrame(df_dict)
    df['DateOf'] = pd.to_datetime(df['DateOf'], format='%Y-%m-%d')
    
    df.sort_values(['ID','DateOf'], inplace=True)
    df['Days_since_activity'] = df.groupby(['ID'])['DateOf'].diff()
    
    mask = df.ID != df.ID.shift(1)
    mask2 = df.groupby('ID').Count.shift(1) == 0
    
    df['Days_since_activity'][mask] = np.nan
    df['Days_since_activity'][mask2] = df.groupby(['ID'])['DateOf'].diff(2)
    
    df['Chg. Avg Value'] = df.groupby(['ID'])['Avg. Value'].diff()
    df['Chg. Avg Value'][mask2] = df.groupby(['ID'])['Avg. Value'].diff(2)
    
    conditions = [((df['Count'] == 0)),]
    choices = [np.nan,]
    df['Chg. Avg Value'] = np.select(conditions, choices, default = df['Chg. Avg Value'])
    
    # df = df.sort_index()
    df
    
    新的未排序输出便于比较:

        DateOf  ID  Count   Avg. Value  Days_since_activity Chg. Avg Value
    12  2017-08-01  553 4   4.4      NaT        NaN
    9   2017-08-02  553 1   3.0      1 days     -1.4
    6   2017-08-03  553 3   5.3      1 days     2.3
    3   2017-08-04  553 0   0.0      1 days     NaN
    0   2017-08-07  553 0   0.0      4 days     NaN
    13  2017-08-01  559 4   6.4      NaT        NaN
    10  2017-08-02  559 0   0.0      1 days     NaN
    7   2017-08-03  559 9   5.0      2 days     -1.4
    4   2017-08-04  559 11  4.2      1 days     -0.8
    1   2017-08-07  559 4   3.5      3 days     -0.7
    14  2017-08-01  914 0   0.0      NaT        NaN
    11  2017-08-02  914 2   2.0      NaT        NaN
    8   2017-08-03  914 0   0.0      1 days     NaN
    5   2017-08-04  914 10  3.3      2 days     1.3
    2   2017-08-07  914 5   2.2      3 days     -1.1
    

    索引11应该是NAT,因为当前最前面的行有0个计数,没有别的东西可以与

    比较。你能张贴一些想要的输出吗?考虑使用<代码> NP。选择< /代码> @黑暗NP-选择是这个ChRISZ的方法,所需的输出是DfySoV2OK。首先使用np.select添加条件,然后进行实际的减法运算?谢谢,克里斯。我将对第2部分作一些澄清。但是,对于您的回答,它不考虑有关计数的条件。例如,对于第#0行,由于前一天该ID的计数为零,因此“自#u”活动的天数应返回4天。这有意义吗?