Python 在数据帧中,计算条件在一列中发生的次数?

Python 在数据帧中,计算条件在一列中发生的次数?,python,pandas,dataframe,Python,Pandas,Dataframe,背景 我有五年的NO2测量数据,在csv文件中,每个位置和年份有一个文件。我已将所有文件以相同格式加载到数据帧中: Date Hour Location NO2_Level 0 01/01/2016 00 Street 18 1 01/01/2016 01 Street 39 2 01/01/2016 02 Street 129 3 01/01/2016 03 Street 76 4 01/01/2016 04 Street 4

背景

我有五年的NO2测量数据,在csv文件中,每个位置和年份有一个文件。我已将所有文件以相同格式加载到数据帧中:

Date    Hour    Location    NO2_Level
0   01/01/2016  00  Street  18
1   01/01/2016  01  Street  39
2   01/01/2016  02  Street  129
3   01/01/2016  03  Street  76
4   01/01/2016  04  Street  40
目标

对于每个数据帧计数,NO2_级别大于150的次数并输出该值

所以我写了一个循环,它从正确的目录创建所有的数据帧,并适当地清理它们

问题

无论我尝试了什么,我都知道检查结果不正确,例如: -给定年份中每个位置的计数值相同(可能但不太可能) -在我知道计数应该是正数的一年中,每个位置都返回0

我尝试过的

我尝试了很多方法来获取每个数据帧的此值,例如将列设置为系列:

NO2_Level = pd.Series(df['NO2_Level'])
count = (NO2_Level > 150).sum()'''
使用pd.count():

这两种方法最接近我想要输出的内容

要测试的示例

data = {'Date': ['01/01/2016','01/02/2016',' 01/03/2016', '01/04/2016', '01/05/2016'], 'Hour': ['00', '01', '02', '03', '04'], 'Location':  ['Street','Street','Street','Street','Street',], 'NO2_Level': [18, 39, 129, 76, 40]}
df = pd.DataFrame(data=d)
NO2_Level = pd.Series(df['NO2_Level'])
count = (NO2_Level > 150).sum()
count
预期产出

因此,我试图让它为每个数据帧输出一行,格式为Location,year,count(of condition):

所以上面的例子会产生

Street, 2016, 1
实际值 每年对每个地点产生相同的结果,在某些年份(2014年),当检查时,计数似乎根本不起作用,应该有:

Kirkstall Road,2013,47
Haslewood Close,2013,47
Tilbury Terrace,2013,47
Corn Exchange,2013,47
Temple Newsam,2014,0
Queen Street Morley,2014,0
Corn Exchange,2014,0
Tilbury Terrace,2014,0
Haslewood Close,2015,43
Tilbury Terrace,2015,43
Corn Exchange,2015,43
Jack Lane Hunslet,2015,43
Norman Rows,2015,43

以下是一个(随机)生成样本的解决方案:


生成的示例df:

        Date  Location  NOE_level
0       2018      town        191
1       2017  campaign        187
2       2017      town        137
3       2016    avenue        148
4       2017  campaign        195
5       2018      town        181
6       2018      road        187
7       2018      town        184
8       2016      town        155
9       2016    street        183
10      2018      road        136
11      2017      road        171
12      2018    street        165
13      2015    avenue        193
14      2016  campaign        170
15      2016    street        132
16      2016  campaign        165
17      2015      road        161
18      2018      road        161
19      2015      road        140 
输出:

    Location       Date  count
0     avenue       2015      1
1     avenue       2016      0
2   campaign       2016      2
3   campaign       2017      2
4       road       2015      1
5       road       2017      1
6       road       2018      2
7     street       2016      1
8     street       2018      1
9       town       2016      1
10      town       2017      0
11      town       2018      3
希望这能有所帮助

import pandas as pd

ddict = {
    'Date':['2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-02',],
    'Hour':['00','01','02','03','04','02'],
    'Location':['Street','Street','Street','Street','Street','Street',],
    'N02_Level':[19,39,129,76,40, 151],
}

df = pd.DataFrame(ddict)

# Convert dates to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Make a Year column
df['Year'] = df['Date'].apply(lambda x: x.strftime('%Y'))

# Group by lcoation and year, count by M02_Level > 150
df1 = df[df['N02_Level'] > 150].groupby(['Location','Year']).size().reset_index(name='Count')

# Interate the results
for i in range(len(df1)):
    loc = df1['Location'][i]
    yr = df1['Year'][i]
    cnt = df1['Count'][i]
    print(f'{loc},{yr},{cnt}')


### To not use f-strings
for i in range(len(df1)):
    print('{loc},{yr},{cnt}'.format(loc=df1['Location'][i], yr=df1['Year'][i], cnt=df1['Count'][i]))
样本数据:

        Date Hour Location  N02_Level
0 2016-01-01   00   Street         19
1 2016-01-01   01   Street         39
2 2016-01-01   02   Street        129
3 2016-01-01   03   Street         76
4 2016-01-01   04   Street         40
5 2016-01-02   02   Street        151
输出:

Street,2016,1

你好我能提个建议让人们更容易帮助你吗?而不是发布整个过程-从加载到清理,再到逻辑-只发布相关部分,在一个小样本代码中,使用一个小样本输入和一个预期的输出(这被称为a)。即使你发布了你的实际输出和预期输出,我们也没有复制它的初始数据,所以很难知道如何实现。我觉得你的代码很完美!谢谢RafaelC,说得很好。我现在试着把它整理一下,希望这样更好。谢谢黑客315!
    Location       Date  count
0     avenue       2015      1
1     avenue       2016      0
2   campaign       2016      2
3   campaign       2017      2
4       road       2015      1
5       road       2017      1
6       road       2018      2
7     street       2016      1
8     street       2018      1
9       town       2016      1
10      town       2017      0
11      town       2018      3
import pandas as pd

ddict = {
    'Date':['2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-02',],
    'Hour':['00','01','02','03','04','02'],
    'Location':['Street','Street','Street','Street','Street','Street',],
    'N02_Level':[19,39,129,76,40, 151],
}

df = pd.DataFrame(ddict)

# Convert dates to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Make a Year column
df['Year'] = df['Date'].apply(lambda x: x.strftime('%Y'))

# Group by lcoation and year, count by M02_Level > 150
df1 = df[df['N02_Level'] > 150].groupby(['Location','Year']).size().reset_index(name='Count')

# Interate the results
for i in range(len(df1)):
    loc = df1['Location'][i]
    yr = df1['Year'][i]
    cnt = df1['Count'][i]
    print(f'{loc},{yr},{cnt}')


### To not use f-strings
for i in range(len(df1)):
    print('{loc},{yr},{cnt}'.format(loc=df1['Location'][i], yr=df1['Year'][i], cnt=df1['Count'][i]))
        Date Hour Location  N02_Level
0 2016-01-01   00   Street         19
1 2016-01-01   01   Street         39
2 2016-01-01   02   Street        129
3 2016-01-01   03   Street         76
4 2016-01-01   04   Street         40
5 2016-01-02   02   Street        151
Street,2016,1