Python 按日期和/或条件筛选
我正在使用Python 按日期和/或条件筛选,python,pandas,Python,Pandas,我正在使用pandas尝试统计在两个日期之间购买特定类型合同的会员人数。我正在使用的数据帧类似于: Member Nbr Contract-Type Date-Joined 20 1 Year Membership 2011-08-01 3128 3 Month Membership 2011-07-22 3535 4 Month Membership 2015-02-18 3760
pandas
尝试统计在两个日期之间购买特定类型合同的会员人数。我正在使用的数据帧类似于:
Member Nbr Contract-Type Date-Joined
20 1 Year Membership 2011-08-01
3128 3 Month Membership 2011-07-22
3535 4 Month Membership 2015-02-18
3760 4 Month Membership 2010-02-28
3762 3 Month Membership 2010-01-31
3882 1 Month Membership 2010-04-24
3892 3 Month Membership 2010-03-24
4116 3 Month Membership 2014-12-02
4700 1 Month Membership 2014-11-11
4802 4 Month Membership 2014-07-26
5004 1 Year Membership 2012-03-12
5020 1 Year Membership 2010-07-28
5022 3 Month Membership 2010-06-25
5130 1 Year Membership 2011-01-04
...
如果我只对使用一种合同类型感兴趣,我就能够获得计数
print(len(df[(df['Date-Joined'] > '2010-01-01')
& (df['Date-Joined'] < '2012-02-01')
& (df['Member Type'] == '1 Year Membership')]))
我得到以下错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
用
和条件替换或条件返回0
使用
而不是或。另外,&
优先于|
,因此您的逻辑还需要一组括号
import io
import pandas as pd
data = io.StringIO('''\
Member Nbr,Contract-Type,Date-Joined
20,1 Year Membership,2011-08-01
3128,3 Month Membership,2011-07-22
3535,4 Month Membership,2015-02-18
3760,4 Month Membership,2010-02-28
3762,3 Month Membership,2010-01-31
3882,1 Month Membership,2010-04-24
3892,3 Month Membership,2010-03-24
4116,3 Month Membership,2014-12-02
4700,1 Month Membership,2014-11-11
4802,4 Month Membership,2014-07-26
5004,1 Year Membership,2012-03-12
5020,1 Year Membership,2010-07-28
5022,3 Month Membership,2010-06-25
5130,1 Year Membership,2011-01-04
''')
df = pd.read_csv(data)
print(df[
(df['Date-Joined'] > '2010-01-01') &
(df['Date-Joined'] < '2012-02-01') &
(df['Contract-Type'] == '1 Year Membership')
])
# Member Nbr Contract-Type Date-Joined
# 0 20 1 Year Membership 2011-08-01
# 11 5020 1 Year Membership 2010-07-28
# 13 5130 1 Year Membership 2011-01-04
print(df[
(df['Date-Joined'] > '2010-01-01') &
(df['Date-Joined'] < '2012-02-01') &
(df['Contract-Type'] == '1 Year Membership') |
(df['Contract-Type'] == '4 Month Membership')
])
# Member Nbr Contract-Type Date-Joined
# 0 20 1 Year Membership 2011-08-01
# 2 3535 4 Month Membership 2015-02-18 <====== BEWARE!
# 3 3760 4 Month Membership 2010-02-28
# 9 4802 4 Month Membership 2014-07-26 <====== BEWARE!
# 11 5020 1 Year Membership 2010-07-28
# 13 5130 1 Year Membership 2011-01-04
print(df[
(df['Date-Joined'] > '2010-01-01') &
(df['Date-Joined'] < '2012-02-01') &
((df['Contract-Type'] == '1 Year Membership') |
(df['Contract-Type'] == '4 Month Membership'))
])
# Member Nbr Contract-Type Date-Joined
# 0 20 1 Year Membership 2011-08-01
# 3 3760 4 Month Membership 2010-02-28
# 11 5020 1 Year Membership 2010-07-28
# 13 5130 1 Year Membership 2011-01-04
导入io
作为pd进口熊猫
数据=io.StringIO(“”)\
成员编号、合同类型、加入日期
20.1年会员资格,2011-08-01
3128,3个月会员,2011-07-22
3535,4个月会员资格,2015-02-18
3760,4个月会员资格,2010-02-28
3762,3个月会员资格,2010-01-31
3882,一个月会员资格,2010-04-24
3892,3个月会员资格,2010-03-24
4116,3个月会员资格,2014-12-02
4700,1个月会员资格,2014-11-11
4802,4个月会员资格,2014-07-26
5004,一年会员资格,2012年3月12日
5020,一年会员资格,2010-07-28
5022,3个月会员资格,2010-06-25
5130,一年会员,2011-01-04
''')
df=pd.read\U csv(数据)
打印(df)[
(df['Date-Joined']>'2010-01-01')&
(df['Date-Joined']<'2012-02-01')&
(df['Contract-Type']=='1年会员资格')
])
#成员Nbr合同类型加入日期
#0 20 1年会员资格2011-08-01
#11 5020一年会员资格2010-07-28
#135130一年会员资格2011-01-04
打印(df)[
(df['Date-Joined']>'2010-01-01')&
(df['Date-Joined']<'2012-02-01')&
(df['Contract-Type']=='1年会员资格')|
(df['Contract-Type']=='4个月会员资格')
])
#成员Nbr合同类型加入日期
#0 20 1年会员资格2011-08-01
#23535 4个月会员资格2015-02-18使用|
而不是或。此外,&
优先于|
,因此您的逻辑可能还需要一组括号。
import io
import pandas as pd
data = io.StringIO('''\
Member Nbr,Contract-Type,Date-Joined
20,1 Year Membership,2011-08-01
3128,3 Month Membership,2011-07-22
3535,4 Month Membership,2015-02-18
3760,4 Month Membership,2010-02-28
3762,3 Month Membership,2010-01-31
3882,1 Month Membership,2010-04-24
3892,3 Month Membership,2010-03-24
4116,3 Month Membership,2014-12-02
4700,1 Month Membership,2014-11-11
4802,4 Month Membership,2014-07-26
5004,1 Year Membership,2012-03-12
5020,1 Year Membership,2010-07-28
5022,3 Month Membership,2010-06-25
5130,1 Year Membership,2011-01-04
''')
df = pd.read_csv(data)
print(df[
(df['Date-Joined'] > '2010-01-01') &
(df['Date-Joined'] < '2012-02-01') &
(df['Contract-Type'] == '1 Year Membership')
])
# Member Nbr Contract-Type Date-Joined
# 0 20 1 Year Membership 2011-08-01
# 11 5020 1 Year Membership 2010-07-28
# 13 5130 1 Year Membership 2011-01-04
print(df[
(df['Date-Joined'] > '2010-01-01') &
(df['Date-Joined'] < '2012-02-01') &
(df['Contract-Type'] == '1 Year Membership') |
(df['Contract-Type'] == '4 Month Membership')
])
# Member Nbr Contract-Type Date-Joined
# 0 20 1 Year Membership 2011-08-01
# 2 3535 4 Month Membership 2015-02-18 <====== BEWARE!
# 3 3760 4 Month Membership 2010-02-28
# 9 4802 4 Month Membership 2014-07-26 <====== BEWARE!
# 11 5020 1 Year Membership 2010-07-28
# 13 5130 1 Year Membership 2011-01-04
print(df[
(df['Date-Joined'] > '2010-01-01') &
(df['Date-Joined'] < '2012-02-01') &
((df['Contract-Type'] == '1 Year Membership') |
(df['Contract-Type'] == '4 Month Membership'))
])
# Member Nbr Contract-Type Date-Joined
# 0 20 1 Year Membership 2011-08-01
# 3 3760 4 Month Membership 2010-02-28
# 11 5020 1 Year Membership 2010-07-28
# 13 5130 1 Year Membership 2011-01-04