Python 如何根据同一数据帧的另一列中的值有条件地计算数据帧中的行数?

Python 如何根据同一数据帧的另一列中的值有条件地计算数据帧中的行数?,python,numpy,pandas,Python,Numpy,Pandas,我有一个数据框,其中包含我希望有条件计数的行 TIME VALUE Prev_Time 0 23:01 0 NaN 1 23:02 0 NaN 2 23:03 1 23:02 3 23:04 0 NaN 4 23:05 0 NaN 5 23:06 1 23:05 6 23:07 0 NaN 7 23:08

我有一个数据框,其中包含我希望有条件计数的行

     TIME  VALUE Prev_Time
0   23:01      0       NaN
1   23:02      0       NaN
2   23:03      1     23:02
3   23:04      0       NaN
4   23:05      0       NaN
5   23:06      1     23:05
6   23:07      0       NaN
7   23:08      0       NaN
8   23:09      0       NaN
9   23:10      0       NaN
10  23:11      1     23:10
11  23:12      0       NaN
12  23:13      0       NaN
13  23:14      0       NaN
14  23:15      0       NaN
15  23:16      1     23:15
16  23:17      0       NaN
我想根据列“Prev_Time”上的条件计算行数,以便

  • 在第一次迭代中,它开始计算行数,直到一行 在找到列中的“上一次”之前
  • 在第二次迭代和其他迭代中,它开始计数,包括打印时间的行 所需的输出应为

       ROW_COUNT
    0          2
    1          3
    2          5
    3          5
    4          2
    
    我还需要总计数,比如(len(df)),应该打印出来

    Total Count: 5
    

    这是可行的,你可以根据你的需要修改代码,但是基本的想法是可行的

    #Dummy data set
    df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
    #gets the rows that are not null and extracts their index number
    df=df1[df1['Prev_time'].notnull()].reset_index()
    #Checking for the case where the last row might be null, 
    #need to add it manually to the index
    if df.loc[len(df)-1]['index'] != (len(df1)-1):
       df.loc[len(df)]=[len(df1),0,0,0]
    count=df['index']-df['index'].shift(1).fillna(0)
    len(count)
    

    这是可行的,你可以根据你的需要修改代码,但是基本的想法是可行的

    #Dummy data set
    df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
    #gets the rows that are not null and extracts their index number
    df=df1[df1['Prev_time'].notnull()].reset_index()
    #Checking for the case where the last row might be null, 
    #need to add it manually to the index
    if df.loc[len(df)-1]['index'] != (len(df1)-1):
       df.loc[len(df)]=[len(df1),0,0,0]
    count=df['index']-df['index'].shift(1).fillna(0)
    len(count)
    

    这可能不是一个完美的答案,你会得到你想要的:

    import pandas as pd
    
    #read the data 
    d = pd.read_csv('stackdata.txt')
    
    #we need the last row to be identified, so give it a value
    d['Prev_Time'][len(d)-1]=1
    
    #get all the rows where Prev_Time is not null
    ds = d[d.Prev_Time.notnull()]
    
    #reset the index, you shall get an additional column named index
    ds = ds.reset_index()
    #get only the newly added index column
    dst = ds[ds.columns[0]]
    
    #get the diff of the series
    dstr = dst.diff()
    
    #Get the first value from the previous series and assign it. 
    dstr[0] = dst[0]
    
    #Addd +1 to the last item -- result required.
    dstr[len(dstr)-1] +=1
    len(dstr)
    

    这可能不是一个完美的答案,你会得到你想要的:

    import pandas as pd
    
    #read the data 
    d = pd.read_csv('stackdata.txt')
    
    #we need the last row to be identified, so give it a value
    d['Prev_Time'][len(d)-1]=1
    
    #get all the rows where Prev_Time is not null
    ds = d[d.Prev_Time.notnull()]
    
    #reset the index, you shall get an additional column named index
    ds = ds.reset_index()
    #get only the newly added index column
    dst = ds[ds.columns[0]]
    
    #get the diff of the series
    dstr = dst.diff()
    
    #Get the first value from the previous series and assign it. 
    dstr[0] = dst[0]
    
    #Addd +1 to the last item -- result required.
    dstr[len(dstr)-1] +=1
    len(dstr)
    
    找到好的台词:

    notnull=df[df.VALUE>0]
    """
         TIME  VALUE Prev_Time
    2   23:03      1     23:02
    5   23:06      1     23:05
    10  23:11      1     23:10
    15  23:16      1     23:15
    """
    
    使用
    np.split
    中断:

    row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
    """
       ROW_COUNT
    0          2
    1          3
    2          5
    3          5
    4          2
    """
    
    和计数:

    len(row_counts)
    """
    5
    """
    
    找到好的台词:

    notnull=df[df.VALUE>0]
    """
         TIME  VALUE Prev_Time
    2   23:03      1     23:02
    5   23:06      1     23:05
    10  23:11      1     23:10
    15  23:16      1     23:15
    """
    
    使用
    np.split
    中断:

    row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
    """
       ROW_COUNT
    0          2
    1          3
    2          5
    3          5
    4          2
    """
    
    和计数:

    len(row_counts)
    """
    5
    """
    

    Prev_Time
    列是否已经存在,或者您是否询问如何创建该列,然后收集
    Prev_Time
    有值的行计数?@Grr Yes,“Prev_Time”列已经存在。
    Prev_Time
    列是否已经存在,或者您正在询问如何创建该列,然后收集
    Prev_Time
    有值的行数?@Grr是的,“Prev_Time”列已经存在。