Python 如何根据同一数据帧的另一列中的值有条件地计算数据帧中的行数?
我有一个数据框,其中包含我希望有条件计数的行Python 如何根据同一数据帧的另一列中的值有条件地计算数据帧中的行数?,python,numpy,pandas,Python,Numpy,Pandas,我有一个数据框,其中包含我希望有条件计数的行 TIME VALUE Prev_Time 0 23:01 0 NaN 1 23:02 0 NaN 2 23:03 1 23:02 3 23:04 0 NaN 4 23:05 0 NaN 5 23:06 1 23:05 6 23:07 0 NaN 7 23:08
TIME VALUE Prev_Time
0 23:01 0 NaN
1 23:02 0 NaN
2 23:03 1 23:02
3 23:04 0 NaN
4 23:05 0 NaN
5 23:06 1 23:05
6 23:07 0 NaN
7 23:08 0 NaN
8 23:09 0 NaN
9 23:10 0 NaN
10 23:11 1 23:10
11 23:12 0 NaN
12 23:13 0 NaN
13 23:14 0 NaN
14 23:15 0 NaN
15 23:16 1 23:15
16 23:17 0 NaN
我想根据列“Prev_Time”上的条件计算行数,以便
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
我还需要总计数,比如(len(df)),应该打印出来
Total Count: 5
这是可行的,你可以根据你的需要修改代码,但是基本的想法是可行的
#Dummy data set
df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
#gets the rows that are not null and extracts their index number
df=df1[df1['Prev_time'].notnull()].reset_index()
#Checking for the case where the last row might be null,
#need to add it manually to the index
if df.loc[len(df)-1]['index'] != (len(df1)-1):
df.loc[len(df)]=[len(df1),0,0,0]
count=df['index']-df['index'].shift(1).fillna(0)
len(count)
这是可行的,你可以根据你的需要修改代码,但是基本的想法是可行的
#Dummy data set
df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
#gets the rows that are not null and extracts their index number
df=df1[df1['Prev_time'].notnull()].reset_index()
#Checking for the case where the last row might be null,
#need to add it manually to the index
if df.loc[len(df)-1]['index'] != (len(df1)-1):
df.loc[len(df)]=[len(df1),0,0,0]
count=df['index']-df['index'].shift(1).fillna(0)
len(count)
这可能不是一个完美的答案,你会得到你想要的:
import pandas as pd
#read the data
d = pd.read_csv('stackdata.txt')
#we need the last row to be identified, so give it a value
d['Prev_Time'][len(d)-1]=1
#get all the rows where Prev_Time is not null
ds = d[d.Prev_Time.notnull()]
#reset the index, you shall get an additional column named index
ds = ds.reset_index()
#get only the newly added index column
dst = ds[ds.columns[0]]
#get the diff of the series
dstr = dst.diff()
#Get the first value from the previous series and assign it.
dstr[0] = dst[0]
#Addd +1 to the last item -- result required.
dstr[len(dstr)-1] +=1
len(dstr)
这可能不是一个完美的答案,你会得到你想要的:
import pandas as pd
#read the data
d = pd.read_csv('stackdata.txt')
#we need the last row to be identified, so give it a value
d['Prev_Time'][len(d)-1]=1
#get all the rows where Prev_Time is not null
ds = d[d.Prev_Time.notnull()]
#reset the index, you shall get an additional column named index
ds = ds.reset_index()
#get only the newly added index column
dst = ds[ds.columns[0]]
#get the diff of the series
dstr = dst.diff()
#Get the first value from the previous series and assign it.
dstr[0] = dst[0]
#Addd +1 to the last item -- result required.
dstr[len(dstr)-1] +=1
len(dstr)
找到好的台词:
notnull=df[df.VALUE>0]
"""
TIME VALUE Prev_Time
2 23:03 1 23:02
5 23:06 1 23:05
10 23:11 1 23:10
15 23:16 1 23:15
"""
使用np.split
中断:
row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
"""
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
"""
和计数:
len(row_counts)
"""
5
"""
找到好的台词:
notnull=df[df.VALUE>0]
"""
TIME VALUE Prev_Time
2 23:03 1 23:02
5 23:06 1 23:05
10 23:11 1 23:10
15 23:16 1 23:15
"""
使用np.split
中断:
row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
"""
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
"""
和计数:
len(row_counts)
"""
5
"""
Prev_Time
列是否已经存在,或者您是否询问如何创建该列,然后收集Prev_Time
有值的行计数?@Grr Yes,“Prev_Time”列已经存在。Prev_Time
列是否已经存在,或者您正在询问如何创建该列,然后收集Prev_Time
有值的行数?@Grr是的,“Prev_Time”列已经存在。