Python 通过按年统计日期来确定就诊人数_Python_Pandas_Count

Python 通过按年统计日期来确定就诊人数

python pandas

Python 通过按年统计日期来确定就诊人数,python,pandas,count,Python,Pandas,Count,我有点被困在如何用python实现这一点上；可能有一个更简单的解决方案，我在stackoverflow或google上找不到我有以下数据帧df： Customer_ID | date | year | Dollars ABC 2017-02-07 2017 456 ABC 2017-03-05 2017 167 ABC

我有点被困在如何用python实现这一点上；可能有一个更简单的解决方案，我在stackoverflow或google上找不到

我有以下数据帧df：

Customer_ID | date             | year             | Dollars
ABC           2017-02-07         2017               456
ABC           2017-03-05         2017               167
ABC           2016-12-13         2016               320
ABC           2015-04-07         2015               145
BCD           2017-09-08         2017               155
BCD           2016-10-22         2016               274
BCD           2016-10-19         2016               255

它是一个简单的数据帧，但非常大。对于每个客户，我都有他们交易的日期和花费。我为我的分析创建了年份栏

#ensured data is in date format
df['date']=pd.to_datetime(df['date'], format='%Y-%m-%d')

#year of transaction as per comment from @Andrew L
df['year'] = df['date'].dt.year

我想做以下工作：

计算客户在其整个交易历史记录中的访问号码
计算一年中客户的就诊号码

因此，我正在寻找这个输出：

Customer_ID| date     | year | Dollars |visit# |17visit#| 16visit# | 15visit#
    ABC     2017-02-07  2017   456         3      1         0          0               
    ABC     2017-03-05  2017   167         4      2         0          0
    ABC     2016-12-13  2016   320         2      0         1          0
    ABC     2015-04-07  2015   145         1      0         0          1
    BCD     2017-09-08  2017   155         3      1         0          0
    BCD     2016-10-22  2016   274         2      0         2          0
    BCD     2016-10-19  2016   255         1      0         1          0

我不知道从哪里开始，这会是groupby和count系列中的某件事，但会是约会吗

如有任何想法或建议，将不胜感激。感谢您使用您的数据：

df
  Customer_ID        date  year  Dollars
0         ABC  2017-02-07  2017      456
1         ABC  2017-03-05  2017      167
2         ABC  2016-12-13  2016      320
3         ABC  2015-04-07  2015      145
4         BCD  2017-09-08  2017      155
5         BCD  2016-10-22  2016      274
6         BCD  2016-10-19  2016      255

按年度查找每个客户的累计访问次数：

df['visit_yr'] = df.groupby(['Customer_ID', 'year']).cumcount()+1

我们现在有“访问年”-每年访问次数：

df
  Customer_ID        date  year  Dollars  visit_yr
0         ABC  2017-02-07  2017      456         1
1         ABC  2017-03-05  2017      167         2
2         ABC  2016-12-13  2016      320         1
3         ABC  2015-04-07  2015      145         1
4         BCD  2017-09-08  2017      155         1
5         BCD  2016-10-22  2016      274         1
6         BCD  2016-10-19  2016      255         2

使用此功能，我们可以将年份旋转为列（最后两位数），同时将

NaN

s替换为0，然后重新连接到

df

：

df.join(df.assign(yr_2 =df.year.astype(str).str[2:]+'visit').pivot(columns='yr_2', values='visit_yr').replace(np.nan, 0.0)).drop('visit_yr', axis=1)
  Customer_ID        date  year  Dollars  visit_yr  15visit  16visit  17visit
0         ABC  2017-02-07  2017      456         1      0.0      0.0      1.0
1         ABC  2017-03-05  2017      167         2      0.0      0.0      2.0
2         ABC  2016-12-13  2016      320         1      0.0      1.0      0.0
3         ABC  2015-04-07  2015      145         1      1.0      0.0      0.0
4         BCD  2017-09-08  2017      155         1      0.0      0.0      1.0
5         BCD  2016-10-22  2016      274         1      0.0      1.0      0.0
6         BCD  2016-10-19  2016      255         2      0.0      2.0      0.0

整个数据集的访问计数：

df['visit'] = df.groupby('Customer_ID').cumcount()+1

使用您的数据：

df
  Customer_ID        date  year  Dollars
0         ABC  2017-02-07  2017      456
1         ABC  2017-03-05  2017      167
2         ABC  2016-12-13  2016      320
3         ABC  2015-04-07  2015      145
4         BCD  2017-09-08  2017      155
5         BCD  2016-10-22  2016      274
6         BCD  2016-10-19  2016      255

按年度查找每个客户的累计访问次数：

df['visit_yr'] = df.groupby(['Customer_ID', 'year']).cumcount()+1

我们现在有“访问年”-每年访问次数：

df
  Customer_ID        date  year  Dollars  visit_yr
0         ABC  2017-02-07  2017      456         1
1         ABC  2017-03-05  2017      167         2
2         ABC  2016-12-13  2016      320         1
3         ABC  2015-04-07  2015      145         1
4         BCD  2017-09-08  2017      155         1
5         BCD  2016-10-22  2016      274         1
6         BCD  2016-10-19  2016      255         2

使用此功能，我们可以将年份旋转为列（最后两位数），同时将

NaN

s替换为0，然后重新连接到

df

：

df.join(df.assign(yr_2 =df.year.astype(str).str[2:]+'visit').pivot(columns='yr_2', values='visit_yr').replace(np.nan, 0.0)).drop('visit_yr', axis=1)
  Customer_ID        date  year  Dollars  visit_yr  15visit  16visit  17visit
0         ABC  2017-02-07  2017      456         1      0.0      0.0      1.0
1         ABC  2017-03-05  2017      167         2      0.0      0.0      2.0
2         ABC  2016-12-13  2016      320         1      0.0      1.0      0.0
3         ABC  2015-04-07  2015      145         1      1.0      0.0      0.0
4         BCD  2017-09-08  2017      155         1      0.0      0.0      1.0
5         BCD  2016-10-22  2016      274         1      0.0      1.0      0.0
6         BCD  2016-10-19  2016      255         2      0.0      2.0      0.0

整个数据集的访问计数：

df['visit'] = df.groupby('Customer_ID').cumcount()+1

强烈建议不要这样做-

df['year']=df['date'].astype（str）.str[:4]

。您应该改为这样做

df['year']=df['date'].dt.year

。非常感谢您的建议-立即更改将强烈建议不要这样做-

df['year']=df['date'].astype（str）.str[:4]

。您应该改为这样做

df['year']=df['date'].dt.year

。非常感谢您的建议-立即更改您是最好的！多谢各位,；我已经对它进行了升级，现在正在测试它；我知道这一定是groupby的事！谢谢大家!@jeangelj很乐意帮忙@如果这回答了你的问题，请接受我的回答。再次感谢。当然-只是在我的大数据集上测试它；谢谢你的耐心你是最棒的！多谢各位,；我已经对它进行了升级，现在正在测试它；我知道这一定是groupby的事！谢谢大家!@jeangelj很乐意帮忙@如果这回答了你的问题，请接受我的回答。再次感谢。当然-只是在我的大数据集上测试它；谢谢你的耐心