R 根据有限的随机数随机选择行_R

R 根据有限的随机数随机选择行

R 根据有限的随机数随机选择行,r,R,看起来很简单，但我搞不懂我有一堆动物位置数据作为一个数据框。我试图随机选择每个人的X个位置进行进一步分析，警告X在6-156范围内所以我试图建立一个循环，首先随机选择6-156范围内的一个值，然后使用该值，比如说56，从第一只动物身上随机抽取56个位置，以此类推 for(i in unique(ANIMALS$ID)){ sub<-sample(6:156,1) sub2<-i([sample(nrow(i),sub),]) } 这里有一种使用mapply的方法。此函数获取

看起来很简单，但我搞不懂

我有一堆动物位置数据作为一个数据框。我试图随机选择每个人的X个位置进行进一步分析，警告X在6-156范围内

所以我试图建立一个循环，首先随机选择6-156范围内的一个值，然后使用该值，比如说56，从第一只动物身上随机抽取56个位置，以此类推

for(i in unique(ANIMALS$ID)){
  sub<-sample(6:156,1)
sub2<-i([sample(nrow(i),sub),])
}

这里有一种使用mapply的方法。此函数获取两个列表或可以强制转换为列表的内容，并将函数FUN应用于相应的元素

# simulate some data
xy <- data.frame(animal = rep(1:10, each = 10), loc = runif(100))

# calculate number of samples for individual animal
num.samples.per.animal <- sample(3:6, length(unique(xy$animal)), replace = TRUE)

num.samples.per.animal
 [1] 6 3 4 4 6 3 3 6 3 5

# subset random x number of rows from each animal
result <- do.call("rbind", 
                  mapply(num.samples.per.animal, split(xy, f = xy$animal), FUN = function(x, y) {
                    y[sample(1:nrow(y), x),]
                  }, SIMPLIFY = FALSE)
)
result

    animal        loc
7        1 0.99483999
1        1 0.50951321
10       1 0.36505294
6        1 0.34058842
8        1 0.26489107
9        1 0.47418823
13       2 0.27213396
12       2 0.28087775
15       2 0.22130069
23       3 0.33646632
21       3 0.02395097
28       3 0.53079981
29       3 0.85287600
35       4 0.84534073
33       4 0.87370167
31       4 0.85646813
34       4 0.11642335
46       5 0.59624723
48       5 0.15379729
45       5 0.57046122
42       5 0.88799675
44       5 0.62171858
49       5 0.75014593
60       6 0.86915983
54       6 0.03152932
56       6 0.66128549
64       7 0.85420774
70       7 0.89262455
68       7 0.40829671
78       8 0.19073661
72       8 0.20648832
80       8 0.71778913
73       8 0.77883677
75       8 0.37647108
74       8 0.65339300
82       9 0.39957202
85       9 0.31188471
88       9 0.10900795
100     10 0.55282999
95      10 0.10145296
96      10 0.09713218
93      10 0.64900866
94      10 0.76099256

编辑

下面是另一种更直接的方法，它还可以处理行数小于应分配的样本数的情况

set.seed(357)
result <- do.call("rbind",
                  by(xy, INDICES = xy$animal, FUN = function(x) {
                    avail.obs <- nrow(x)

                    num.rows <- sample(3:15, 1)
                    while (num.rows > avail.obs) {
                      message("Sample to be larger than available data points, repeating sampling.")
                      num.rows <- sample(3:15, 1)
                    }
                    x[sample(1:avail.obs, num.rows), ]
                  }))
result

我喜欢Stackoverflow，因为我学到了很多@RomanLustrik提供了一个简单的解决方案；我的也是直向前的：

# simulate some data
xy <- data.frame(animal = rep(1:10, each = 10), loc = runif(100))

newVec <- NULL #Create a blank dataFrame

for(i in unique(xy$animal)){
  #Sample a number between 1 and 10 (or 6 and 156, if you need)
    samp <- sample(1:10, 1) 
  #Determine which rows of dataFrame xy correspond with unique(xy$animal)[i]
    rows <- which(xy$animal == unique(xy$animal)[i]) 
  #From xy, sample samp times from the rows associated with unique(xy$animal)[i]
    newVec1 <- xy[sample(rows, samp, replace = TRUE), ]
  #append everything to the same new dataFrame 
    newVec <- rbind(newVec, newVec1) 
  }

你能展示你的部分数据吗？可能是dputheadID？首先，动物是数据帧的名称还是ID？按照设置唯一语句的方式，ID是数据帧的名称，您正在运行动物向量。嗨，Roman，示例中出错。intlengthx，size，replace，问题：当“replace=FALSE”时，无法获取大于总体的样本。我应该补充一点，即每只动物的行数不同。@odocoileus我添加了另一种解决方案，该解决方案适用于点数少于样本所需点数的情况。另一种处理方法是获取所有可用点。如果您决定这样做，我将把它留给您作为编写If子句的练习。sample中的错误。intlengthx，size，replace，prob:invalid first argument hmm…我想可能有一些不到156行的个体将分析搞砸了，所以我删除了n=3，然后重新运行分析…仍然收到相同的错误消息…有什么想法吗？提前谢谢！lengthx的值是多少？lengthx=object not foundlengthxy，从您的示例=13I仅将df和列标题从xy$animal更改为elk$elkID

set.seed(357)
result <- do.call("rbind",
                  by(xy, INDICES = xy$animal, FUN = function(x) {
                    avail.obs <- nrow(x)

                    num.rows <- sample(3:15, 1)
                    while (num.rows > avail.obs) {
                      message("Sample to be larger than available data points, repeating sampling.")
                      num.rows <- sample(3:15, 1)
                    }
                    x[sample(1:avail.obs, num.rows), ]
                  }))
result

# simulate some data
xy <- data.frame(animal = rep(1:10, each = 10), loc = runif(100))

newVec <- NULL #Create a blank dataFrame

for(i in unique(xy$animal)){
  #Sample a number between 1 and 10 (or 6 and 156, if you need)
    samp <- sample(1:10, 1) 
  #Determine which rows of dataFrame xy correspond with unique(xy$animal)[i]
    rows <- which(xy$animal == unique(xy$animal)[i]) 
  #From xy, sample samp times from the rows associated with unique(xy$animal)[i]
    newVec1 <- xy[sample(rows, samp, replace = TRUE), ]
  #append everything to the same new dataFrame 
    newVec <- rbind(newVec, newVec1) 
  }