R 按位置数量分组
我想了解一下订单是如何按照地区数量分配的。那么,一个地方有多少订单号,两个订单号,依此类推 例如:R 按位置数量分组,r,R,我想了解一下订单是如何按照地区数量分配的。那么,一个地方有多少订单号,两个订单号,依此类推 例如: Nr <- c("x1", "x2", "x2", "x2", "x3", "x4", "x4", "x4", "x5", "x5", "x5", "x6") location <- c("a", "b", "a", "b", "c", "a", "a", "a", "a", "b", "c", "d") (test <- data.frame(cbind(Nr, locati
Nr <- c("x1", "x2", "x2", "x2", "x3", "x4", "x4", "x4", "x5", "x5", "x5",
"x6")
location <- c("a", "b", "a", "b", "c", "a", "a", "a", "a", "b", "c", "d")
(test <- data.frame(cbind(Nr, location)))
> test
Nr location
1 x1 a
2 x2 b
3 x2 a
4 x2 b
5 x3 c
6 x4 a
7 x4 a
8 x4 a
9 x5 a
10 x5 b
11 x5 c
12 x6 d
我按照订单号和位置对表格进行排序,包括相应的数量:
# A tibble: 9 x 3
# Groups: Nr [6]
Nr location quantity
<fct> <fct> <int>
1 x1 a 1
2 x2 a 1
3 x2 b 2
4 x3 c 1
5 x4 a 3
6 x5 a 1
7 x5 b 1
8 x5 c 1
9 x6 d 1
不幸的是,我不知道怎么做。有人能帮我吗?一个选项是通过“Nr”创建一列“location”的不同元素,并获得
计数
library(dplyr)
test %>%
group_by(Nr) %>%
mutate(n_Order = n_distinct(location)) %>%
ungroup %>%
count(n_Order)
# A tibble: 3 x 2
# n_Order n
# <int> <int>
#1 1 6
#2 2 3
#3 3 3
以下是与您的数字相同的输出:
library(data.table)
test <- as.data.table(test)
> str(test)
Classes ‘data.table’ and 'data.frame': 12 obs. of 2 variables:
$ order : chr "x1" "x2" "x2" "x2" ...
$ location: chr "a" "b" "a" "b" ...
- attr(*, ".internal.selfref")=<externalptr>
> test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)][, .(observations = sum(total_qty)), by = .(num_locations)]
num_locations observations
1: 1 6
2: 2 3
3: 3 3
请尝试test%%>%groupby(Nr)%%>%summary(number=n_distinct(location))
您是否需要test2%%>%groupby(quantity)%%>%summary(number=n())
作为旁注,data.frame(cbind(Nr,location))
是编写data.frame(Nr,location)
的更糟糕方法。在这种情况下,结果是等效的,因为所有内容都以字符开始,但是如果您有数字列,cbind()
将强制所有内容,包括数字,到因子类。恐怕不是。我试图在下图中表明我的意图:几乎。观察的数量有问题(或者我是不是计数不正确?)。结果应为n_顺序:1,2,3;第6、3、3条,3@fuul从评论中不清楚。您需要复制行吗?@fuul根据该示例,计数为7,2,3确定这里是另一个图,它应该是什么样子的:。绿色代表一个位置,紫色代表两个位置,黄色代表三个(不同)位置。@aktrun非常感谢您的努力!感谢您提供的替代解决方案!我有点害怕data.table的语法。但是我想我应该看看这个包。@fuul它类似于SQL查询。这里有一个很好的介绍:我发现它是一个非常快速、有用的软件包,可以帮助我用更少的代码行(一般来说)完成工作。一定要看一看!
library(dplyr)
test %>%
group_by(Nr) %>%
mutate(n_Order = n_distinct(location)) %>%
ungroup %>%
count(n_Order)
# A tibble: 3 x 2
# n_Order n
# <int> <int>
#1 1 6
#2 2 3
#3 3 3
with(test, table(ave(location, Nr, FUN = function(x) length(unique(x)))))
library(data.table)
test <- as.data.table(test)
> str(test)
Classes ‘data.table’ and 'data.frame': 12 obs. of 2 variables:
$ order : chr "x1" "x2" "x2" "x2" ...
$ location: chr "a" "b" "a" "b" ...
- attr(*, ".internal.selfref")=<externalptr>
> test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)][, .(observations = sum(total_qty)), by = .(num_locations)]
num_locations observations
1: 1 6
2: 2 3
3: 3 3
# intermediate data.table test2
> test2 <- test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)]
> test2[, .(observations = sum(total_qty)), by = .(num_locations)]
num_locations observations
1: 1 6
2: 2 3
3: 3 3