R 基于两个或多个变量的所有可能组合的子集数据表_R_Dataframe_Data.table_Subset_Combn

R 基于两个或多个变量的所有可能组合的子集数据表

r dataframe

R 基于两个或多个变量的所有可能组合的子集数据表,r,dataframe,data.table,subset,combn,R,Dataframe,Data.table,Subset,Combn,我想根据一些变量是否全部为正、全部为负或两者之间的某种组合来子集一个data.frame。对于n变量，这将导致2^n可能的组合我认为combn可以用来实现这一点，但我正在努力正确地做到这一点样本数据： library(data.table) dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1)) 库（data.table） dt 0，] dt[x0z0，] dt[x>0

我想根据一些变量是否全部为正、全部为负或两者之间的某种组合来子集一个

data.frame

。对于

变量，这将导致

2^n

可能的组合

我认为

combn

可以用来实现这一点，但我正在努力正确地做到这一点

样本数据：

library(data.table)
dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1))

库（data.table）
dt 0，]
dt[x<0&y>0z<0，]
dt[x<0&y>0z>0，]
dt[x>0&y<0 z<0，]
dt[x>0&y<0 z>0，]
dt[x>0&y>0z<0，]
dt[x>0&y>0z>0，]

到目前为止，我所尝试的：

combinator <- function(z){
  cnames <- colnames(z)
  combinations <- t(combn(c(rep("<", ncol(z)), rep(">", ncol(z))),ncol(z)))

  retval <- t(sapply(1:nrow(combinations), function(p){
    sapply(1:ncol(z), function(q) paste(cnames[q], combinations[p,q], 0))
  }))

  return(apply(retval, 1, paste, collapse = " & "))
}

组合符l[1] [1] “x<0&y<0&z<0” >子集（dt，eval（l[1]））子集数据表（dt，eval（l[1]）中存在错误： “子集”必须计算为逻辑此外，如果以下内容显示我没有列出所有所需的组合：

> unique(l)
[1] "x < 0 & y < 0 & z < 0" "x < 0 & y < 0 & z > 0" 
[3] "x < 0 & y > 0 & z > 0" "x > 0 & y > 0 & z > 0"

>唯一（l）
[1] “x<0&y<0&z<0”“x<0&y<0&z>0”
[3] “x<0&y>0&z>0”“x>0&y>0&z>0”

输出应该有8个唯一的结果，而不是上面显示的4个

只需执行

dt[，sign\u combi:=do.call（粘贴，lappy（dt，sign））]

即可根据需要拆分该列，例如，

拆分（dt，dt$sign\u combi）

。试图将代码粘贴在一起是个坏主意

例如：

set.seed(47) # setting seed for reproducibility
dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1))

# create combination column (you could keep it separate if you prefer)
dt[, sign_combi := do.call(paste, lapply(dt, sign))]

# split original data by sign combinations
result = split(dt, dt$sign_combi)

# list of 8 resulting data tables
length(result)
# [1] 8

# peaking at the first three rows of the first three tables:
lapply(head(result, 3), head, 3)
# $`-1 -1 -1`
#             x          y          z sign_combi
# 1: -0.5713038 -0.7103555 -0.6873705   -1 -1 -1
# 2: -0.1407803 -0.8371153 -0.3686299   -1 -1 -1
# 3: -0.6478446 -0.7629461 -0.7458949   -1 -1 -1
# 
# $`-1 -1 1`
#             x          y         z sign_combi
# 1: -0.8070969 -0.3952283 0.9212030    -1 -1 1
# 2: -0.1190934 -0.4969318 0.8082232    -1 -1 1
# 3: -0.6536104 -0.3280965 0.6880454    -1 -1 1
# 
# $`-1 1 -1`
#              x         y          z sign_combi
# 1: -0.78789241 0.8577848 -0.7586369    -1 1 -1
# 2: -0.04442825 0.4736388 -0.3354734    -1 1 -1
# 3: -0.22105744 0.3012645 -0.4160631    -1 1 -1

set.seed（47）#为再现性设置种子
dt只需执行dt[，sign\u combi:=do.call（粘贴，lappy（dt，sign））]
即可根据需要拆分该列，例如，拆分（dt，dt$sign\u combi）
。试图将代码粘贴在一起是个坏主意
例如：
set.seed(47) # setting seed for reproducibility
dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1))

# create combination column (you could keep it separate if you prefer)
dt[, sign_combi := do.call(paste, lapply(dt, sign))]

# split original data by sign combinations
result = split(dt, dt$sign_combi)

# list of 8 resulting data tables
length(result)
# [1] 8

# peaking at the first three rows of the first three tables:
lapply(head(result, 3), head, 3)
# $`-1 -1 -1`
#             x          y          z sign_combi
# 1: -0.5713038 -0.7103555 -0.6873705   -1 -1 -1
# 2: -0.1407803 -0.8371153 -0.3686299   -1 -1 -1
# 3: -0.6478446 -0.7629461 -0.7458949   -1 -1 -1
# 
# $`-1 -1 1`
#             x          y         z sign_combi
# 1: -0.8070969 -0.3952283 0.9212030    -1 -1 1
# 2: -0.1190934 -0.4969318 0.8082232    -1 -1 1
# 3: -0.6536104 -0.3280965 0.6880454    -1 -1 1
# 
# $`-1 1 -1`
#              x         y          z sign_combi
# 1: -0.78789241 0.8577848 -0.7586369    -1 1 -1
# 2: -0.04442825 0.4736388 -0.3354734    -1 1 -1
# 3: -0.22105744 0.3012645 -0.4160631    -1 1 -1

set.seed（47）#为再现性设置种子
太好了，对我有用！不知道符号函数。仅供参考，data.table添加了自己的拆分函数，允许egsplit（dt，by=“sign\u combi”，keep.by=FALSE）
删除用于拆分的列。太好了，适合我！不知道signs函数。仅供参考，data.table添加了自己的拆分函数，允许egsplit（dt，by=“sign\u combi”，keep.by=FALSE）删除用于拆分的列。
set.seed(47) # setting seed for reproducibility
dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1))

# create combination column (you could keep it separate if you prefer)
dt[, sign_combi := do.call(paste, lapply(dt, sign))]

# split original data by sign combinations
result = split(dt, dt$sign_combi)

# list of 8 resulting data tables
length(result)
# [1] 8

# peaking at the first three rows of the first three tables:
lapply(head(result, 3), head, 3)
# $`-1 -1 -1`
#             x          y          z sign_combi
# 1: -0.5713038 -0.7103555 -0.6873705   -1 -1 -1
# 2: -0.1407803 -0.8371153 -0.3686299   -1 -1 -1
# 3: -0.6478446 -0.7629461 -0.7458949   -1 -1 -1
# 
# $`-1 -1 1`
#             x          y         z sign_combi
# 1: -0.8070969 -0.3952283 0.9212030    -1 -1 1
# 2: -0.1190934 -0.4969318 0.8082232    -1 -1 1
# 3: -0.6536104 -0.3280965 0.6880454    -1 -1 1
# 
# $`-1 1 -1`
#              x         y          z sign_combi
# 1: -0.78789241 0.8577848 -0.7586369    -1 1 -1
# 2: -0.04442825 0.4736388 -0.3354734    -1 1 -1
# 3: -0.22105744 0.3012645 -0.4160631    -1 1 -1