R 在两种情况下随机删除子组的行（交叉列表）_R

R 在两种情况下随机删除子组的行（交叉列表）

R 在两种情况下随机删除子组的行（交叉列表）,r,R,我有一个数据框，包含以下列：日期、结果（否或是），和组（一个或两个）：给我一个这样的结果 one two no 260 271 yes 235 234 我现在想随机从一个单元格列中删除（或从中取样-不确定哪种方法更好；我相信它们应该具有相同的效果，尽管我不确定），比如右上角（组2和结果否），并保留10%的数据有人能给我指出我必须使用哪些命令和条件的正确方向吗？你可以这样做 idx <- which(Data$group=="one" & Data$out

我有一个数据框，包含以下列：

日期

、

结果

（

否

或

是

），和

组

（

一个

或

两个

）：

给我一个这样的结果

      one two
  no  260 271
  yes 235 234

我现在想随机从一个单元格列中删除（或从中取样-不确定哪种方法更好；我相信它们应该具有相同的效果，尽管我不确定），比如右上角（组

和结果

否

），并保留10%的数据

有人能给我指出我必须使用哪些命令和条件的正确方向吗？

你可以这样做

idx <- which(Data$group=="one" & Data$outcome=="no") #identify relevant group

Data2 <- Data[-sample(idx, 0.9*length(idx), replace=FALSE),] #sample 90% to remove

table(Data2$outcome, Data2$group)
      one two
  no   28 260
  yes 234 235

table(Data$outcome, Data$group)         
      one two
  no  271 260
  yes 234 235

idx你可以这样做
idx <- which(Data$group=="one" & Data$outcome=="no") #identify relevant group

Data2 <- Data[-sample(idx, 0.9*length(idx), replace=FALSE),] #sample 90% to remove

table(Data2$outcome, Data2$group)
      one two
  no   28 260
  yes 234 235

table(Data$outcome, Data$group)         
      one two
  no  271 260
  yes 234 235

idx这里有一个tidyverse
解决方案：
library(tidyverse)
Data2 <-
  Data %>%
  split(group_indices(.,group,outcome)) %>%
  purrr::modify_if(~first(.$group)=="two" & first(.$outcome)=="no",
                   ~slice(.,sample(nrow(.),round(nrow(.)/10)))) %>%
  bind_rows



table(Data2$outcome, Data2$group)
# one two
# no  271  26
# yes 234 235

库（tidyverse）
数据2%
分割（组指数（、组、结果））%>%
purrr：：如果（~first（.$group）=“two”&第一（.$outcome）=“no”，
~切片（、样品（nrow（、圆形（nrow（）/10）））%>%
绑定行
表（数据2$结果，数据2$组）
#一二
#第27126号
#是234 235
以下是一个tidyverse
解决方案：
library(tidyverse)
Data2 <-
  Data %>%
  split(group_indices(.,group,outcome)) %>%
  purrr::modify_if(~first(.$group)=="two" & first(.$outcome)=="no",
                   ~slice(.,sample(nrow(.),round(nrow(.)/10)))) %>%
  bind_rows



table(Data2$outcome, Data2$group)
# one two
# no  271  26
# yes 234 235

库（tidyverse）
数据2%
分割（组指数（、组、结果））%>%
purrr：：如果（~first（.$group）=“two”&第一（.$outcome）=“no”，
~切片（、样品（nrow（、圆形（nrow（）/10）））%>%
绑定行
表（数据2$结果，数据2$组）
#一二
#第27126号
#是234 235
编写函数使其更通用：
get_reduced_data <- function(Data, group, outcome) {
   #Get indices of the subset which satisfies our condition
   indx = which(Data$group == group & Data$outcome == outcome)
   #Select only 10% from the subset and keep remaining rows as it is
   Data[c(sample(indx, length(indx) * 0.1), setdiff(seq(nrow(Data)), indx)), ]
}

df = get_reduced_data(Data, "two", "no")

table(df$outcome, df$group)

#      one two
#  no  271  26
#  yes 234 235

df = get_reduced_data(Data, "one", "no")

table(df$outcome, df$group)

#      one two
#  no   27 260
#  yes 234 235

get_reduced_data编写函数使其更通用：
get_reduced_data <- function(Data, group, outcome) {
   #Get indices of the subset which satisfies our condition
   indx = which(Data$group == group & Data$outcome == outcome)
   #Select only 10% from the subset and keep remaining rows as it is
   Data[c(sample(indx, length(indx) * 0.1), setdiff(seq(nrow(Data)), indx)), ]
}

df = get_reduced_data(Data, "two", "no")

table(df$outcome, df$group)

#      one two
#  no  271  26
#  yes 234 235

df = get_reduced_data(Data, "one", "no")

table(df$outcome, df$group)

#      one two
#  no   27 260
#  yes 234 235

get_reduced_数据从“两个”和“否”组保留10%？i、 那一组只有27行吗？顺便说一句，对我来说，结果被交换为one
和two
。是的，10%来自two
/no
——我实际上看到的数据框要大得多，所以不要太在意数字。不确定分组交换顺序的背后是什么…？从“两个”和“不”分组中保留10%？i、 那一组只有27行吗？顺便说一句，对我来说，结果被交换为one
和two
。是的，10%来自two
/no
——我实际上看到的数据框要大得多，所以不要太在意数字。而且不确定在组的交换顺序后面是什么。。。？