如何在data.frame中创建一个新列，以便该列统计该data.frame中不同行的数量？_R

如何在data.frame中创建一个新列，以便该列统计该data.frame中不同行的数量？

如何在data.frame中创建一个新列，以便该列统计该data.frame中不同行的数量？,r,R,我有这样一个巨大的数据帧首先，如何将新列date1添加到此data.frame中，以便该列统计此data.frame中唯一不同日期的数量，然后在新创建的列中按升序排列第二，如何将另一列date2添加到此data.frame中，以便该列计算一天中的所有不同id year month day id 2011 1 5 31 2011 1 14 22 2011 2 6 28 2011 2 17 41

我有这样一个巨大的数据帧

首先，如何将新列date1添加到此data.frame中，以便该列统计此data.frame中唯一不同日期的数量，然后在新创建的列中按升序排列

第二，如何将另一列date2添加到此data.frame中，以便该列计算一天中的所有不同id

    year  month day id
    2011    1   5   31
    2011    1   14  22
    2011    2   6   28
    2011    2   17  41
    2011    3   9   55
    2011    1   5   34
    2011    1   14  25
    2011    2   6   36
    2011    2   17  11
    2011    3   12  10

我期望的结果是这样的。请帮忙

    year month day  id date1 date2
    2011    1   5   31  1     2
    2011    1   14  22  2     2
    2011    2   6   28  3     2
    2011    2   17  41  4     2
    2011    3   9   55  5     1
    2011    1   5   34  1     2
    2011    1   14  25  2     2
    2011    2   6   36  3     2
    2011    2   17  11  4     2
    2011    3   12  10  6     1

我们可以首先使用unite将年、月和日合并到一列中，并为该组合的每个组指定一个唯一的编号，然后按相同的组合对_进行分组，并使用n_distinct计算每个组合的唯一id

我们可以在tidyverse中更简洁地实现这一点，方法是在组中获得“年”、“月”、“日”的组索引，然后创建“日期2”作为“id”n“distinct”的不同元素数

或者，这可以通过基本R的交互和ave来实现

数据

谢谢@Ronak Shah，我已经尝试过了，但是data1列的结果并不像我预期的那样。我有3年的数据，数据的最后一天应该是新列中最大的数字，但不是。这是数据中的又一天。到处都有这样那样的错误。第二天可以。

library(dplyr)
library(tidyr)

df %>%
  unite(date, year, month, day, sep = "-", remove = FALSE) %>%
  mutate(date1 = as.integer(factor(date,level = unique(date)))) %>%
  group_by(date) %>%
  mutate(date2 = n_distinct(id)) %>%
  ungroup() %>%
  select(-date)


#    year month   day    id date1 date2
#   <int> <int> <int> <int> <int> <int>
# 1  2011     1     5    31     1     2
# 2  2011     1    14    22     2     2
# 3  2011     2     6    28     3     2
# 4  2011     2    17    41     4     2
# 5  2011     3     9    55     5     1
# 6  2011     1     5    34     1     2
# 7  2011     1    14    25     2     2
# 8  2011     2     6    36     3     2
# 9  2011     2    17    11     4     2
#10  2011     3    12    10     6     1

librarytidyverse)
df1 %>% 
     group_by(date1 = group_indices(., year, month, day)) %>% 
     mutate(date2 = n_distinct(id))
# A tibble: 10 x 6
# Groups:   date1 [6]
#    year month   day    id date1 date2
#   <int> <int> <int> <int> <int> <int>
# 1  2011     1     5    31     1     2
# 2  2011     1    14    22     2     2
# 3  2011     2     6    28     3     2
# 4  2011     2    17    41     4     2
# 5  2011     3     9    55     5     1
# 6  2011     1     5    34     1     2
# 7  2011     1    14    25     2     2
# 8  2011     2     6    36     3     2
# 9  2011     2    17    11     4     2
#10  2011     3    12    10     6     1

library(data.table)
setDT(df1)[, date1 := .GRP, .(year, month, day)][, date2 := uniqueN(id), date1][]
#     year month day id date1 date2
# 1: 2011     1   5 31     1     2
# 2: 2011     1  14 22     2     2
# 3: 2011     2   6 28     3     2
# 4: 2011     2  17 41     4     2
# 5: 2011     3   9 55     5     1
# 6: 2011     1   5 34     1     2
# 7: 2011     1  14 25     2     2
# 8: 2011     2   6 36     3     2
# 9: 2011     2  17 11     4     2
#10: 2011     3  12 10     6     1

df1$date1 <- with(df1, as.integer(interaction(year, month, day, 
         drop = TRUE, lex.order = TRUE)))
df1$date2 <- with(df1, ave(id, date1, FUN = function(x) length(unique(x))))

df1 <- structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L), month = c(1L, 1L, 2L, 2L, 3L, 1L, 
1L, 2L, 2L, 3L), day = c(5L, 14L, 6L, 17L, 9L, 5L, 14L, 6L, 17L, 
12L), id = c(31L, 22L, 28L, 41L, 55L, 34L, 25L, 36L, 11L, 10L
)), class = "data.frame", row.names = c(NA, -10L))