R 按位置数量分组_R - Fatal编程技术网

R 按位置数量分组

R 按位置数量分组,r,R,我想了解一下订单是如何按照地区数量分配的。那么，一个地方有多少订单号，两个订单号，依此类推例如： Nr <- c("x1", "x2", "x2", "x2", "x3", "x4", "x4", "x4", "x5", "x5", "x5", "x6") location <- c("a", "b", "a", "b", "c", "a", "a", "a", "a", "b", "c", "d") (test <- data.frame(cbind(Nr, locati

我想了解一下订单是如何按照地区数量分配的。那么，一个地方有多少订单号，两个订单号，依此类推

例如：

Nr <- c("x1", "x2", "x2", "x2", "x3", "x4", "x4", "x4", "x5", "x5", "x5", 
"x6")
location <- c("a", "b", "a", "b", "c", "a", "a", "a", "a", "b", "c", "d")
(test <- data.frame(cbind(Nr, location)))

> test
Nr location
1  x1        a
2  x2        b
3  x2        a
4  x2        b
5  x3        c
6  x4        a
7  x4        a
8  x4        a
9  x5        a
10 x5        b
11 x5        c
12 x6        d

我按照订单号和位置对表格进行排序，包括相应的数量：

# A tibble: 9 x 3
# Groups:   Nr [6]
Nr    location quantity
<fct> <fct>     <int>
1 x1    a             1
2 x2    a             1
3 x2    b             2
4 x3    c             1
5 x4    a             3
6 x5    a             1
7 x5    b             1
8 x5    c             1
9 x6    d             1

不幸的是，我不知道怎么做。有人能帮我吗？

一个选项是通过“Nr”创建一列“location”的不同元素，并获得

计数
library(dplyr)
test %>% 
  group_by(Nr) %>% 
  mutate(n_Order = n_distinct(location)) %>% 
  ungroup %>% 
  count(n_Order)
# A tibble: 3 x 2
#  n_Order     n
#    <int> <int>
#1       1     6
#2       2     3
#3       3     3

以下是与您的数字相同的输出：
library(data.table)

test <- as.data.table(test)

> str(test)
Classes ‘data.table’ and 'data.frame':  12 obs. of  2 variables:
 $ order   : chr  "x1" "x2" "x2" "x2" ...
 $ location: chr  "a" "b" "a" "b" ...
 - attr(*, ".internal.selfref")=<externalptr> 

> test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)][, .(observations = sum(total_qty)), by = .(num_locations)]
   num_locations observations
1:             1            6
2:             2            3
3:             3            3

请尝试test%%>%groupby（Nr）%%>%summary（number=n_distinct（location））
您是否需要test2%%>%groupby（quantity）%%>%summary（number=n（））
作为旁注，data.frame（cbind（Nr，location））
是编写data.frame（Nr，location）
的更糟糕方法。在这种情况下，结果是等效的，因为所有内容都以字符开始，但是如果您有数字列，cbind（）
将强制所有内容，包括数字，到因子类。恐怕不是。我试图在下图中表明我的意图：几乎。观察的数量有问题（或者我是不是计数不正确？）。结果应为n_顺序：1,2,3；第6、3、3条，3@fuul从评论中不清楚。您需要复制行吗？@fuul根据该示例，计数为7,2,3确定这里是另一个图，它应该是什么样子的：。绿色代表一个位置，紫色代表两个位置，黄色代表三个（不同）位置。@aktrun非常感谢您的努力！感谢您提供的替代解决方案！我有点害怕data.table的语法。但是我想我应该看看这个包。@fuul它类似于SQL查询。这里有一个很好的介绍：我发现它是一个非常快速、有用的软件包，可以帮助我用更少的代码行（一般来说）完成工作。一定要看一看！
library(dplyr)
test %>% 
  group_by(Nr) %>% 
  mutate(n_Order = n_distinct(location)) %>% 
  ungroup %>% 
  count(n_Order)
# A tibble: 3 x 2
#  n_Order     n
#    <int> <int>
#1       1     6
#2       2     3
#3       3     3

with(test, table(ave(location, Nr, FUN = function(x) length(unique(x)))))

library(data.table)

test <- as.data.table(test)

> str(test)
Classes ‘data.table’ and 'data.frame':  12 obs. of  2 variables:
 $ order   : chr  "x1" "x2" "x2" "x2" ...
 $ location: chr  "a" "b" "a" "b" ...
 - attr(*, ".internal.selfref")=<externalptr> 

> test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)][, .(observations = sum(total_qty)), by = .(num_locations)]
   num_locations observations
1:             1            6
2:             2            3
3:             3            3

# intermediate data.table test2
> test2 <- test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)]

> test2[, .(observations = sum(total_qty)), by = .(num_locations)]
   num_locations observations
1:             1            6
2:             2            3
3:             3            3