Python 使用DateTimeIndex统计数据帧中字符串的出现次数_Python_Pandas_Dataframe_Datetimeindex

Python 使用DateTimeIndex统计数据帧中字符串的出现次数

python pandas dataframe

Python 使用DateTimeIndex统计数据帧中字符串的出现次数,python,pandas,dataframe,datetimeindex,Python,Pandas,Dataframe,Datetimeindex,我有一个数据帧，其时间序列如下： timestamp v IceCreamOrder Location 2018-01-03 02:21:16 Chocolate South 2018-01-03 12:41:12 Vanilla North 2018-01-03 14:32:15 Strawberry North 2018-01-03 15:32:15 Strawberry North

我有一个数据帧，其时间序列如下：

timestamp   v            IceCreamOrder  Location
2018-01-03  02:21:16     Chocolate      South
2018-01-03  12:41:12     Vanilla        North
2018-01-03  14:32:15     Strawberry     North
2018-01-03  15:32:15     Strawberry     North
2018-01-04  02:21:16     Strawberry     North
2018-01-04  02:21:16     Rasberry       North
2018-01-04  12:41:12     Vanilla        North
2018-01-05  15:32:15     Chocolate      North

我想得到这样的计数：

timestamp   strawberry  chocolate
1/2/14      0           1
1/3/14      2           0
1/4/14      1           0
1/4/14      0           0
1/4/14      0           0
1/5/14      0           1

因为这是时间序列数据，所以我一直以datetimeindex格式存储时间戳

我一开始就想知道“草莓”的数量。我最终得到了这个不起作用的代码

mydf = (inputdf.set_index('timestamp').groupby(pd.Grouper(freq = 'D'))['IceCreamOrder'].count('Strawberry'))

这会导致错误：

TypeError: count() takes 1 positional argument but 2 were given

任何帮助都将不胜感激。

使用（

==

）按

字符串比较列，使用聚合求和
计数真
值，因为真
是类似1
s的过程：
#convert to datetimes if necessary
inputdf['timestamp'] = pd.to_datetime(inputdf['timestamp'], format='%m/%d/%y')
print (inputdf)
   timestamp IceCreamOrder Location
0 2018-01-02     Chocolate    South
1 2018-01-03       Vanilla    North
2 2018-01-03    Strawberry    North
3 2018-01-03    Strawberry    North
4 2018-01-04    Strawberry    North
5 2018-01-04      Rasberry    North
6 2018-01-04       Vanilla    North
7 2018-01-05     Chocolate    North

mydf = (inputdf.set_index('timestamp')['IceCreamOrder']
               .eq('Strawberry')
               .groupby(pd.Grouper(freq = 'D'))
               .sum())
print (mydf)
timestamp
2018-01-02    0.0
2018-01-03    2.0
2018-01-04    1.0
2018-01-05    0.0
Freq: D, Name: IceCreamOrder, dtype: float64

如果要计算所有类型
s，请将列IceCreamOrder
添加到groupby
并聚合：

如果所有的datetime
s没有time
s：
mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
                .size()
                .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

使用：

或：
如果您的timestamp
列有时间，只需在使用dt.date
进行这些操作之前删除时间即可（如果您不想修改列，可能需要创建一个新的系列以用于旋转）：
你想要巧克力和草莓的计数吗？或者所有类型方面，我希望获得所有类型。@lespaul-所有日期时间都没有time
s？我的日期时间确实有添加它们的时间now@lespaul-感谢新数据，输出中有3次1/4/14或打字错误？
mydf1 = (inputdf.set_index('timestamp')
               .groupby([pd.Grouper(freq = 'D'),'IceCreamOrder'])
               .size()
               .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
                .size()
                .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

df.pivot_table(
    index='timestamp', columns='IceCreamOrder', aggfunc='size'
).fillna(0).astype(int)

IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

pd.crosstab(df.timestamp, df.IceCreamOrder)

IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

df.timestamp = df.timestamp.dt.date