Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 查找组中特定列的唯一值_R_Dplyr_Distinct - Fatal编程技术网

R 查找组中特定列的唯一值

R 查找组中特定列的唯一值,r,dplyr,distinct,R,Dplyr,Distinct,尝试使用lengthuniqueID时,会给出总行数,而不是特定组中的行数 data<-sqldf("select count(distinct ID) as distinctID,count(type) as rowCount,type,ag_id,Outcome,bdate,sd_num from buy_pattern group by ag_id,Outcome,sd_num,bdate") # > data # distinctID rowCount type a

尝试使用lengthuniqueID时,会给出总行数,而不是特定组中的行数

data<-sqldf("select count(distinct ID) as distinctID,count(type) as rowCount,type,ag_id,Outcome,bdate,sd_num from buy_pattern group by ag_id,Outcome,sd_num,bdate")


 # > data 
 # distinctID rowCount type ag_id    Outcome bdate  sd_num
 # 1          2        7   A1 A0001 Aggressive  2012 AIG0001
 # 2          1        1   B1 B0001   Balanced  2012 AIG0001

主要原因是“ID”作为对象在全局环境中创建为向量,在dplyr链中,select没有调用“ID”,导致“ID”从全局环境中获取。整个向量“ID”将有3个唯一的元素,它不会一步一步地跟随组_。基本上,将“ID”保留在select中可以解决问题。有n_distinct可替代lengthunique


我们可以使用n_distinct,原因是您在selectsapplysplitbuy_模式$ID、buy_模式$Outcome、uniqueor tapplybuy_模式$ID、buy_模式$Outcome中没有ID,唯一性取决于您对组的定义,在本例中,mutate和summary之间有什么区别吗?@Akki区别在于mutate将包含所有列,然后当您切片时,它将给出每个组的第一行,其中as summary不会给出“ID”,即在group_by和新的摘要列将出现在输出中摘要将在350万行上运行此示例时提供性能优势?@Akki With mutate,您正在创建列,而摘要只是对其进行摘要。因此,性能会有所提高
data<-sqldf("select count(distinct ID) as distinctID,count(type) as rowCount,type,ag_id,Outcome,bdate,sd_num from buy_pattern group by ag_id,Outcome,sd_num,bdate")


 # > data 
 # distinctID rowCount type ag_id    Outcome bdate  sd_num
 # 1          2        7   A1 A0001 Aggressive  2012 AIG0001
 # 2          1        1   B1 B0001   Balanced  2012 AIG0001
    data<-buy_pattern %>% select(type,ag_id,Outcome,bdate,sd_num) %>% 
    group_by(type,ag_id,Outcome,sd_num,bdate) %>%    
    mutate(rowCount = n(),distinctID=length(unique(ID))) %>% 
    arrange(ag_id,Outcome,sd_num, desc(rowCount))  %>% 
    slice(1)     

 # > data

 #  distinctID rowCount type ag_id    Outcome bdate  sd_num
 #  1          3        7   A1 A0001 Aggressive  2012 AIG0001
 #  2          3        1   B1 B0001   Balanced  2012 AIG0001
buy_pattern %>% 
      select(ID, type,ag_id,Outcome,bdate,sd_num) %>% # change here
      group_by(type,ag_id,Outcome,sd_num,bdate) %>%
      mutate(rowCount = n(),distinctID=length(unique(ID))) %>% 
      arrange(ag_id,Outcome,sd_num, desc(rowCount))  %>% 
      slice(1) 
# A tibble: 2 x 8
# Groups:   type, ag_id, Outcome, sd_num, bdate [2]
#     ID   type  ag_id    Outcome  bdate  sd_num rowCount distinctID
#   <dbl> <fctr> <fctr>     <fctr> <fctr>  <fctr>    <int>      <int>
#1     1     A1  A0001 Aggressive   2012 AIG0001        7          2
#2     3     B1  B0001   Balanced   2012 AIG0001        1          1
buy_pattern %>%
     group_by(type, ag_id, Outcome, sd_num, bdate) %>%
     summarise(rowCount = n(), distinctID = n_distinct(ID))