R 使用列表删除组中具有特定值的随机变量子集_R

R 使用列表删除组中具有特定值的随机变量子集

R 使用列表删除组中具有特定值的随机变量子集,r,R,这是这个问题上的一个细微变化。我正在寻找的变化是如何删除行的子集，其中每次分组条件更改时删除的行数都会更改。下面是一个简单的示例数据集，其中包含一列数值和一列数值分组（分组列也可以是“AA1”、“AA2”等因子）例如：当分组值等于15时，我想随机删除4行；当group=16时，随机删除7行；当group=17时，随机删除7行。对于每个分组变量，此过程将继续以下是我当前的解决方案： (dfindex<-which(df$a==15)) ##create index that meets

这是这个问题上的一个细微变化。我正在寻找的变化是如何删除行的子集，其中每次分组条件更改时删除的行数都会更改。下面是一个简单的示例数据集，其中包含一列数值和一列数值分组（分组列也可以是“AA1”、“AA2”等因子）

例如：当分组值等于15时，我想随机删除4行；当group=16时，随机删除7行；当group=17时，随机删除7行。对于每个分组变量，此过程将继续

以下是我当前的解决方案：

(dfindex<-which(df$a==15)) ##create index that meets the grouping variable criteria
(delete.df.index<-sample(dfindex,4)) ##select number of rows to randomly remove
dfnew<-df[-delete.df.index,] ##create a new data frame and delete the randomly selected rows

提前感谢您的任何意见

试着运行这个-

set.seed(23)
df<-data.frame(a=round(rnorm(50,mean=20,sd=2)))

# create table of no of rows that need to be removed per each a
noofrowsremove <- read.table(textConnection(
'a toremove
21 1  
23 2  
15 2  
17 1  
19 2  
20 2  
24 2  
16 1
22 1
18 3'), header = TRUE)

library(data.table)

# assign random number in a new column, this will help in sampling
df$tosample <- runif(50)

# convert data.frame to data.table, grouped operations are easier on data.table
dt <- data.table(df)
# rank the tosample column within each unique a value
dt[,samplerank := rank(tosample), by = 'a']
# merge the filtering no of rows with dt
dt <- merge(dt,noofrowsremove, by = 'a')
# filter out rows that have samplerank columns <= the no of rows that need to be removed
dttrimmed <- dt[samplerank > toremove]

set.seed（23）
df在研究完Codoremifa提供的答案后，我注意到一些小细节，可能值得其他人在找到这篇文章时加以记录。使用Codoremifa提供的答案，我做了一些小改动，并添加了一些额外的代码来说明一些重要的细节。基本上，请注意合并步骤并决定如何处理合并步骤生成的NA值
set.seed(23)
df<-data.frame(a=round(rnorm(50,mean=20,sd=2)))

# create table of no of rows that need to be removed per each a
noofrowsremove <- read.table(textConnection(
 'a toremove
21 0  

17 1  
19 2  
20 2  
24 2  
16 1
22 1
18 3'), header = TRUE)

##excluded values 23 and 15 from the above df to illustrate an example below
#Keep value 21 and just assigned it a 0 (i.e., do not remove any values of 21).

library(data.table)

# assign random number in a new column, this will help in sampling
df$tosample <- runif(50) #can also use runif(nrow(df))

# convert data.frame to data.table, grouped operations are easier on data.table
dt <- data.table(df)

# rank the tosample column within each unique a value
dt[,samplerank := rank(tosample), by = 'a']

# merge the filtering no of rows with dt.  Be careful with merge options.
dt1 <- merge(dt,noofrowsremove, by = 'a') #46 rows
dt2 <- merge(dt,noofrowsremove, by = 'a',all=TRUE) #51 rows. 

#Notice slight differences in the number of rows between dt1 and dt2 
#In dt2, value 23 in the toremove column is "NA" because 23 was not included in noofrowsremove
nrow(dt1) #46 rows
nrow(dt2) #51 rows

##to keep values with "NA" change the "NA" to a 0
dt2$toremove[is.na(dt2$toremove)] <- 0 #assign NA to 0

# filter out rows that have samplerank columns <= the no of rows that need to be removed
dttrimmed1 <- dt1[samplerank > toremove] #36 rows.  toremove values with NA are exlcuded
dttrimmed2 <- dt2[samplerank > toremove] #40 rows.  Kept values with NA reasigned to 0

set.seed（23）
除非我遗漏了什么，为什么不为
循环使用？例如，扫描您提供的上一个表中的行，如dfnew$a==Group[I]
sample（dfindex，Numberofrows[I]）
。此外，您可能还想将您的dfnew
s保存在列表中，即mylist[[i]]谢谢@Codoremifa这非常有效。我会投票支持这个答案，但我是一个新用户，没有足够的信誉点。这是一个赞成票！费尔德豪伊-没问题。您可以考虑通过单击答案旁边的复选标记来接受答案。
(dfindex<-which(dfnew$a==16)) ##create another index from the grouping variable criteria 
(delete.df.index<-sample(dfindex,3)) ##select rows to randomly delete
dfnew<-dfnew[-delete.df.index,] ##delete rows

(dfindex<-which(dfnew$a==17))
(delete.df.index<-sample(dfindex,7))
dfnew<-dfnew[-delete.df.index,]

Group  Number of rows to randomly remove
14          0
15          4
16          3
17          7
18          40
19          23

set.seed(23)
df<-data.frame(a=round(rnorm(50,mean=20,sd=2)))

# create table of no of rows that need to be removed per each a
noofrowsremove <- read.table(textConnection(
'a toremove
21 1  
23 2  
15 2  
17 1  
19 2  
20 2  
24 2  
16 1
22 1
18 3'), header = TRUE)

library(data.table)

# assign random number in a new column, this will help in sampling
df$tosample <- runif(50)

# convert data.frame to data.table, grouped operations are easier on data.table
dt <- data.table(df)
# rank the tosample column within each unique a value
dt[,samplerank := rank(tosample), by = 'a']
# merge the filtering no of rows with dt
dt <- merge(dt,noofrowsremove, by = 'a')
# filter out rows that have samplerank columns <= the no of rows that need to be removed
dttrimmed <- dt[samplerank > toremove]

set.seed(23)
df<-data.frame(a=round(rnorm(50,mean=20,sd=2)))

# create table of no of rows that need to be removed per each a
noofrowsremove <- read.table(textConnection(
 'a toremove
21 0  

17 1  
19 2  
20 2  
24 2  
16 1
22 1
18 3'), header = TRUE)

##excluded values 23 and 15 from the above df to illustrate an example below
#Keep value 21 and just assigned it a 0 (i.e., do not remove any values of 21).

library(data.table)

# assign random number in a new column, this will help in sampling
df$tosample <- runif(50) #can also use runif(nrow(df))

# convert data.frame to data.table, grouped operations are easier on data.table
dt <- data.table(df)

# rank the tosample column within each unique a value
dt[,samplerank := rank(tosample), by = 'a']

# merge the filtering no of rows with dt.  Be careful with merge options.
dt1 <- merge(dt,noofrowsremove, by = 'a') #46 rows
dt2 <- merge(dt,noofrowsremove, by = 'a',all=TRUE) #51 rows. 

#Notice slight differences in the number of rows between dt1 and dt2 
#In dt2, value 23 in the toremove column is "NA" because 23 was not included in noofrowsremove
nrow(dt1) #46 rows
nrow(dt2) #51 rows

##to keep values with "NA" change the "NA" to a 0
dt2$toremove[is.na(dt2$toremove)] <- 0 #assign NA to 0

# filter out rows that have samplerank columns <= the no of rows that need to be removed
dttrimmed1 <- dt1[samplerank > toremove] #36 rows.  toremove values with NA are exlcuded
dttrimmed2 <- dt2[samplerank > toremove] #40 rows.  Kept values with NA reasigned to 0