Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ssis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
程序设计R循环_R_For Loop_Rank - Fatal编程技术网

程序设计R循环

程序设计R循环,r,for-loop,rank,R,For Loop,Rank,我需要编程R的帮助。我有一列的data.frame B x<- c("300","300","300","400","400","400","500","500","500"....etc.) **2 milion rows** x这里有一种使用dplyr的方法,在200万行上大约需要0.2秒 首先,我制作样本数据: n = 2E6 # number of rows in test library(dplyr) sample_data <- data.frame( x =

我需要编程R的帮助。我有一列的data.frame B

x<- c("300","300","300","400","400","400","500","500","500"....etc.)  **2 milion rows** 

x这里有一种使用
dplyr
的方法,在200万行上大约需要0.2秒

首先,我制作样本数据:

n = 2E6  # number of rows in test
library(dplyr)
sample_data <- data.frame(
  x = round(runif(n = n, min = 1, max = 100000), digits = 0)
) %>%
  arrange(x)  # Optional, added to make output clearer so that each x is adjacent to the others that match.

下面是一个使用base
R
的解决方案:

B <- data.frame(x = rep(c(300, 400, 400), sample(c(5:10), 3)))
B
B$y <- ave(B$x, B$x, FUN=seq_along)

B
x
中的值是否总是重复三次?x中的值能否在序列中稍后重复?例如,我们可以在“500”之后再加上“300”吗?值不会总是重复。下一个值可以是,800800。它可能不是价值观。它可以是一个你可以澄清的专栏。“不会总是重复”与“永远不会重复”不同。使用
library(data.table)
,你可以做
B[,y:=1.N,by=x]
我只有基本的软件包,我认为dplyr在另一个软件包中。对吗?是的,
dplyr
是R最流行的软件包。要添加它,你需要键入
install.packages(“dplyr”)
。我强烈推荐它用于此类操作。仅在子句?和tidyverse中有dplyr?@dado
ave()
is base
R
;类似于
B$y的东西
n = 2E6  # number of rows in test
library(dplyr)
sample_data <- data.frame(
  x = round(runif(n = n, min = 1, max = 100000), digits = 0)
) %>%
  arrange(x)  # Optional, added to make output clearer so that each x is adjacent to the others that match.
sample_data_with_rank <- sample_data %>%
  group_by(x) %>%
  mutate(y = row_number()) %>%
  ungroup()

head(sample_data_with_rank, 20)

# A tibble: 20 x 2
       x     y
   <dbl> <int>
 1     1     1
 2     1     2
 3     1     3
 4     1     4
 5     1     5
 6     1     6
 7     1     7
 8     1     8
 9     1     9
10     1    10
11     1    11
12     1    12
13     1    13
14     1    14
15     1    15
16     2     1
17     2     2
18     2     3
19     2     4
20     2     5
B <- data.frame(x = rep(c(300, 400, 400), sample(c(5:10), 3)))
B
B$y <- ave(B$x, B$x, FUN=seq_along)