Python 使用DateTimeIndex统计数据帧中字符串的出现次数
我有一个数据帧,其时间序列如下:Python 使用DateTimeIndex统计数据帧中字符串的出现次数,python,pandas,dataframe,datetimeindex,Python,Pandas,Dataframe,Datetimeindex,我有一个数据帧,其时间序列如下: timestamp v IceCreamOrder Location 2018-01-03 02:21:16 Chocolate South 2018-01-03 12:41:12 Vanilla North 2018-01-03 14:32:15 Strawberry North 2018-01-03 15:32:15 Strawberry North
timestamp v IceCreamOrder Location
2018-01-03 02:21:16 Chocolate South
2018-01-03 12:41:12 Vanilla North
2018-01-03 14:32:15 Strawberry North
2018-01-03 15:32:15 Strawberry North
2018-01-04 02:21:16 Strawberry North
2018-01-04 02:21:16 Rasberry North
2018-01-04 12:41:12 Vanilla North
2018-01-05 15:32:15 Chocolate North
我想得到这样的计数:
timestamp strawberry chocolate
1/2/14 0 1
1/3/14 2 0
1/4/14 1 0
1/4/14 0 0
1/4/14 0 0
1/5/14 0 1
因为这是时间序列数据,所以我一直以datetimeindex格式存储时间戳
我一开始就想知道“草莓”的数量。我最终得到了这个不起作用的代码
mydf = (inputdf.set_index('timestamp').groupby(pd.Grouper(freq = 'D'))['IceCreamOrder'].count('Strawberry'))
这会导致错误:
TypeError: count() takes 1 positional argument but 2 were given
任何帮助都将不胜感激。使用(==
)按字符串比较列,使用聚合求和
计数真
值,因为真
是类似1
s的过程:
#convert to datetimes if necessary
inputdf['timestamp'] = pd.to_datetime(inputdf['timestamp'], format='%m/%d/%y')
print (inputdf)
timestamp IceCreamOrder Location
0 2018-01-02 Chocolate South
1 2018-01-03 Vanilla North
2 2018-01-03 Strawberry North
3 2018-01-03 Strawberry North
4 2018-01-04 Strawberry North
5 2018-01-04 Rasberry North
6 2018-01-04 Vanilla North
7 2018-01-05 Chocolate North
mydf = (inputdf.set_index('timestamp')['IceCreamOrder']
.eq('Strawberry')
.groupby(pd.Grouper(freq = 'D'))
.sum())
print (mydf)
timestamp
2018-01-02 0.0
2018-01-03 2.0
2018-01-04 1.0
2018-01-05 0.0
Freq: D, Name: IceCreamOrder, dtype: float64
如果要计算所有类型
s,请将列IceCreamOrder
添加到groupby
并聚合:
如果所有的datetime
s没有time
s:
mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
使用:
或:
如果您的timestamp
列有时间,只需在使用dt.date
进行这些操作之前删除时间即可(如果您不想修改列,可能需要创建一个新的系列以用于旋转):
你想要巧克力和草莓的计数吗?或者所有类型方面,我希望获得所有类型。@lespaul-所有日期时间都没有time
s?我的日期时间确实有添加它们的时间now@lespaul-感谢新数据,输出中有3次1/4/14
或打字错误?
mydf1 = (inputdf.set_index('timestamp')
.groupby([pd.Grouper(freq = 'D'),'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
df.pivot_table(
index='timestamp', columns='IceCreamOrder', aggfunc='size'
).fillna(0).astype(int)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
pd.crosstab(df.timestamp, df.IceCreamOrder)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0
df.timestamp = df.timestamp.dt.date