R 创建比旧数据帧大的新数据帧_R_Dataframe

R 创建比旧数据帧大的新数据帧

r dataframe

R 创建比旧数据帧大的新数据帧,r,dataframe,R,Dataframe,首先，感谢你所做的一切。我目前正试图解决一个问题（对r来说是个新问题）。我有一个小的n数据帧（例如，n=10），但我想有一个新的数据帧，它包含更多的观察结果（例如，n=15）。一个条件是，我必须确保旧数据集的每个值（即行）在新数据集中至少出现一次。使用sample，我无法实现这一点-某些行有时会丢失编辑简单示例： df = data.frame(matrix(rnorm(20), nrow = 10)) df[sample(nrow(df), 14, replace = TRUE), ]

首先，感谢你所做的一切。我目前正试图解决一个问题（对r来说是个新问题）。我有一个小的n数据帧（例如，

n=10

），但我想有一个新的数据帧，它包含更多的观察结果（例如，

n=15

）。一个条件是，我必须确保旧数据集的每个值（即行）在新数据集中至少出现一次。使用sample，我无法实现这一点-某些行有时会丢失

编辑简单示例：

df = data.frame(matrix(rnorm(20), nrow = 10))
df[sample(nrow(df), 14, replace = TRUE), ]
            X1         X2
9    0.5881409  0.1967030
2    1.1227569  1.9827646
1    1.2225747  0.3428867
10  -0.2780021 -2.3581644
4    0.4687276 -2.2431019
5    1.4592202 -0.6397336
7   -0.8779913  0.4293624
3   -0.1663962 -0.2435444
3.1 -0.1663962 -0.2435444
3.2 -0.1663962 -0.2435444
1.1  1.2225747  0.3428867
1.2  1.2225747  0.3428867
6   -1.0797652 -1.1893041
7.1 -0.8779913  0.4293624

但是，我们可以看到，例如，缺少第8行。

也许您可以尝试下面的代码（使用
```
sample（）
```
）

或者类似下面的内容（使用
```
复制
```
）

数据

df <- structure(list(X1 = c(-0.626453810742332, 0.183643324222082, 
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015, 
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), X2 = c(1.51178116845085, 0.389843236411431, -0.621240580541804, 
-2.2146998871775, 1.12493091814311, -0.0449336090152309, -0.0161902630989461, 
0.943836210685299, 0.821221195098089, 0.593901321217509)), class = "data.frame", row.names = c(NA, 
-10L))

df
也许您可以尝试下面的代码（使用sample（）
）


或者类似下面的内容（使用复制
）

数据
df <- structure(list(X1 = c(-0.626453810742332, 0.183643324222082, 
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015, 
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), X2 = c(1.51178116845085, 0.389843236411431, -0.621240580541804, 
-2.2146998871775, 1.12493091814311, -0.0449336090152309, -0.0161902630989461, 
0.943836210685299, 0.821221195098089, 0.593901321217509)), class = "data.frame", row.names = c(NA, 
-10L))

df以下函数满足问题的要求
说明：
创建一个向量i
，该向量是X
行数和more
行数的排列，这些行数是通过替换随机抽样的。如果nrow（X）>=more
，则可以将此行为更改为无需更换的采样
洗牌该向量i
从原始数据帧X
中提取行i
将行名称设置为连续整数并返回给调用者
给你
larger_df <- function(X, more){
  if(missing(more)) stop(sQuote("more"), " is missing with no default.")
  n <- nrow(X)
  i <- c(sample(n), sample(n, more, replace = TRUE))
  i <- sample(i)
  Y <- X[i, , drop = FALSE]
  row.names(Y) <- NULL
  Y
}

set.seed(1234)
df = data.frame(matrix(rnorm(20), nrow = 10))

larger_df(df1)
larger_df(df1, 5)
larger_df(df1, 25)
larger_df(data.frame(), 5)

larger\u df以下函数执行问题要求的操作
说明：
创建一个向量i
，该向量是X
行数和more
行数的排列，这些行数是通过替换随机抽样的。如果nrow（X）>=more
，则可以将此行为更改为无需更换的采样
洗牌该向量i
从原始数据帧X
中提取行i
将行名称设置为连续整数并返回给调用者
给你
larger_df <- function(X, more){
  if(missing(more)) stop(sQuote("more"), " is missing with no default.")
  n <- nrow(X)
  i <- c(sample(n), sample(n, more, replace = TRUE))
  i <- sample(i)
  Y <- X[i, , drop = FALSE]
  row.names(Y) <- NULL
  Y
}

set.seed(1234)
df = data.frame(matrix(rnorm(20), nrow = 10))

larger_df(df1)
larger_df(df1, 5)
larger_df(df1, 25)
larger_df(data.frame(), 5)

更大的df需要观察需要观察欢迎来到SO！为了更好地帮助您，您能否提供一个小型数据帧的示例？你想得到的那个呢？要添加一个好的可复制示例，请阅读：嗨，如果您能详细说明，将会很有帮助。根据问题陈述，我理解的是您希望合并两个数据帧并保留对旧数据帧的所有观察？n刚刚编辑以进行澄清。谢谢@RuiBarradas restoremonic如果我指定replace=true，这对n_new=25有效吗？欢迎使用！为了更好地帮助您，您能否提供一个小型数据帧的示例？你想得到的那个呢？要添加一个好的可复制示例，请阅读：您好，如果您能详细说明，将非常有帮助。根据问题陈述，我理解的是您希望合并两个数据帧并保留所有旧的观察结果？n刚刚编辑以作澄清。谢谢@RuiBarradas restoremonic如果我指定replace=true，这对n_new=25有效吗？
df <- structure(list(X1 = c(-0.626453810742332, 0.183643324222082, 
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015, 
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), X2 = c(1.51178116845085, 0.389843236411431, -0.621240580541804, 
-2.2146998871775, 1.12493091814311, -0.0449336090152309, -0.0161902630989461, 
0.943836210685299, 0.821221195098089, 0.593901321217509)), class = "data.frame", row.names = c(NA, 
-10L))

larger_df <- function(X, more){
  if(missing(more)) stop(sQuote("more"), " is missing with no default.")
  n <- nrow(X)
  i <- c(sample(n), sample(n, more, replace = TRUE))
  i <- sample(i)
  Y <- X[i, , drop = FALSE]
  row.names(Y) <- NULL
  Y
}

set.seed(1234)
df = data.frame(matrix(rnorm(20), nrow = 10))

larger_df(df1)
larger_df(df1, 5)
larger_df(df1, 25)
larger_df(data.frame(), 5)

observations_needed <- 15
new_rows <- sample(
  x = nrow(df), 
  size = observations_needed - nrow(df),
  replace = TRUE)

all_rows <- c(1:nrow(df), new_rows)
result <- sample(all_rows)

new_df <- df[result,]