R 查找按天分隔的最常见值_R_Date_Categorical Data_Data Processing_Topmost

R 查找按天分隔的最常见值

r date

R 查找按天分隔的最常见值,r,date,categorical-data,data-processing,topmost,R,Date,Categorical Data,Data Processing,Topmost,我想看看每个参与者每天最常出现的类别。每天都会出现多个类别，我希望有一个新的专栏，说明特定参与者在特定日期主要出现的类别我有一列“用户id”、“日期”和一列“类别”（字符）。我应该使用哪种代码来添加一个新列，该列仅说明特定用户在特定日期发生最多的类别 dput：让我们创建一些数据： require(dplyr) set.seed(100) data<-data.frame(user_id=rep(c(1,2,3),10),date=rep(c("tuesday","wednesday"

我想看看每个参与者每天最常出现的类别。每天都会出现多个类别，我希望有一个新的专栏，说明特定参与者在特定日期主要出现的类别

我有一列“用户id”、“日期”和一列“类别”（字符）。我应该使用哪种代码来添加一个新列，该列仅说明特定用户在特定日期发生最多的类别

dput：

让我们创建一些数据：

require(dplyr)
set.seed(100)
data<-data.frame(user_id=rep(c(1,2,3),10),date=rep(c("tuesday","wednesday","thursday"),each=10),category=(sample(c(1:3),30,replace=TRUE)))

现在，我们将根据用户id和日期对其进行分组，并创建一个名为max的新列，该列从每个组中获取最频繁的类别。我们使用

table

over`category来实现这一点，它为每个分组创建列的交叉表：

    data %>% group_by(user_id,date) %>% 
      dplyr::mutate(max=names(sort(table(category),decreasing=TRUE))[1])

# A tibble: 30 x 4
# Groups:   user_id, date [9]
   user_id date      category max  
     <dbl> <fct>        <int> <chr>
 1       1 thursday         3 3    
 2       1 thursday         2 3    
 3       1 thursday         3 3    
 4       1 tuesday          1 1    
 5       1 tuesday          1 1    
 6       1 tuesday          3 1    
 7       1 tuesday          1 1    
 8       1 wednesday        1 1    
 9       1 wednesday        3 1    
10       1 wednesday        2 1    
# ... with 20 more rows

因此，我创建了一个相同的表，但重复了最后一行两次，然后将其中一个类别更改为“News”，并运行相同的代码：

# A tibble: 8 x 4
# Groups:   user_id, date [6]
  user_id date                better_category    max               
  <chr>   <dttm>              <chr>              <chr>             
1 10257   2019-03-14 00:00:00 Email              Email             
2 10580   2019-03-08 00:00:00 Internet_Browser   Internet_Browser  
3 10280   2019-02-26 00:00:00 Instant_Messaging  Instant_Messaging 
4 10202   2019-03-02 00:00:00 News               News              
5 10275   2019-03-18 00:00:00 Background_Process Background_Process
6 10281   2019-03-14 00:00:00 News               Instant_Messaging 
7 10281   2019-03-14 00:00:00 Instant_Messaging  Instant_Messaging 
8 10281   2019-03-14 00:00:00 Instant_Messaging  Instant_Messaging

#一个tible:8 x 4
#组：用户id，日期[6]
用户\u id日期更好\u类别最大值
10257 2019-03-14 00:00:00电子邮件
2105802019-03-08 00:00:00互联网浏览器互联网浏览器
3 10280 2019-02-26 00:00:00即时消息即时消息
410202 2019-03-02 00:00:00新闻
5 10275 2019-03-18 00:00:00背景流程背景流程
610281 2019-03-14 00:00:00新闻即时消息
7 10281 2019-03-14 00:00:00即时消息即时消息
8 10281 2019-03-14 00:00:00即时消息即时消息

请注意最后三行。

您能否使用

dput

提供一些示例数据，以便我们可以尝试测试潜在的解决方案？@iod我已经提供了数据格式的图片。下面的代码确实给出了最常出现的类别（第4列“max”），但它给出了整个数据集最常见的类别，而不仅仅是每个特定用户每天最常见的值。你知道我如何解决这个问题吗？不要将数据作为图像共享。通过调用

dput（head（data））

并将其粘贴到您的问题中，分享您的一点数据。请参阅下面用假数据修改的答案。我看不出你所描述的行为。@iod我认为这确实是问题所在。修改后的代码正在运行！非常感谢你！！只有代码的答案被认为是低质量的：确保提供一个解释，说明你的代码是做什么的，以及它是如何解决问题的。

    data %>% group_by(user_id,date) %>% 
      dplyr::mutate(max=names(sort(table(category),decreasing=TRUE))[1])

# A tibble: 30 x 4
# Groups:   user_id, date [9]
   user_id date      category max  
     <dbl> <fct>        <int> <chr>
 1       1 thursday         3 3    
 2       1 thursday         2 3    
 3       1 thursday         3 3    
 4       1 tuesday          1 1    
 5       1 tuesday          1 1    
 6       1 tuesday          3 1    
 7       1 tuesday          1 1    
 8       1 wednesday        1 1    
 9       1 wednesday        3 1    
10       1 wednesday        2 1    
# ... with 20 more rows

# A tibble: 6 x 4
# Groups:   user_id, date [6]
  user_id date                better_category    max               
  <fct>   <dttm>              <fct>              <chr>             
1 10257   2019-03-14 00:00:00 Email              Email             
2 10580   2019-03-08 00:00:00 Internet_Browser   Internet_Browser  
3 10280   2019-02-26 00:00:00 Instant_Messaging  Instant_Messaging 
4 10202   2019-03-02 00:00:00 News               News              
5 10275   2019-03-18 00:00:00 Background_Process Background_Process
6 10281   2019-03-14 00:00:00 Instant_Messaging  Instant_Messaging

# A tibble: 8 x 4
# Groups:   user_id, date [6]
  user_id date                better_category    max               
  <chr>   <dttm>              <chr>              <chr>             
1 10257   2019-03-14 00:00:00 Email              Email             
2 10580   2019-03-08 00:00:00 Internet_Browser   Internet_Browser  
3 10280   2019-02-26 00:00:00 Instant_Messaging  Instant_Messaging 
4 10202   2019-03-02 00:00:00 News               News              
5 10275   2019-03-18 00:00:00 Background_Process Background_Process
6 10281   2019-03-14 00:00:00 News               Instant_Messaging 
7 10281   2019-03-14 00:00:00 Instant_Messaging  Instant_Messaging 
8 10281   2019-03-14 00:00:00 Instant_Messaging  Instant_Messaging