Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于另一个锯齿形data.frame的data.frame的有效子集设置_R_Dataframe_Data.table_Subset - Fatal编程技术网

基于另一个锯齿形data.frame的data.frame的有效子集设置

基于另一个锯齿形data.frame的data.frame的有效子集设置,r,dataframe,data.table,subset,R,Dataframe,Data.table,Subset,我正在做一个项目,在这个项目中,我需要根据不同的属性组合重复地子集data.frame。现在我正在使用merge函数对data.frame进行子集设置,因为我不知道在运行时输入的属性是什么,这是可行的。但是,我想知道是否有更快的方法来创建子集 require(data.table) df <- structure(list(att1 = c("e", "a", "c", "a", "d", "e", "a", "d", "b", "a", "c", "a", "b", "e", "e",

我正在做一个项目,在这个项目中,我需要根据不同的属性组合重复地子集data.frame。现在我正在使用merge函数对data.frame进行子集设置,因为我不知道在运行时输入的属性是什么,这是可行的。但是,我想知道是否有更快的方法来创建子集

require(data.table)
df <- structure(list(att1 = c("e", "a", "c", "a", "d", "e", "a", "d", "b", "a", "c", "a", "b", "e", "e", "c", "d", "d", "a", "e", "b"), 
                     att2 = c("b", "d", "c", "a", "e", "c", "e", "d", "e", "b", "e", "e", "c", "e", "a", "a", "e", "c", "b", "b", "d"), 
                     att3 = c("c", "b", "e", "b", "d", "d", "d", "c", "c", "d", "e", "a", "d", "c", "e", "a", "d", "e", "d", "a", "e"), 
                     att4 = c("c", "a", "b", "a", "e", "c", "a", "a", "b", "a", "a", "e", "c", "d", "b", "e", "b", "d", "d", "b", "e")), 
                .Names = c("att1", "att2", "att3", "att4"), class = "data.frame", row.names = c(NA, -21L))

#create combinations of attributes
#attributes to search through
cnames <- colnames(df)
att_combos <- data.table()
for(i in 2:length(cnames)){
  combos <- combn(cnames, i)
  for(x in 1:ncol(combos)){
    df_sub <- unique(df[,combos[1:nrow(combos), x]])
    att_combos <- rbind(att_combos, df_sub, fill = T)
  }
}
rm(df_sub, i, x, combos, cnames)
for(i in 1:nrow(att_combos)){
  att_sub <- att_combos[i, ]
  att_sub <- att_sub[, is.na(att_sub)==F, with = F]

  #need to subset data.frame here - very slow on large data.frames
  #anyway to speed this up?
  df_subset_for_analysis <- merge(df, att_sub)
}
require(data.table)

df我会在您想要子集的列上使用
数据.table
,然后生成一个
数据.table
(在运行时)和您感兴趣的组合,然后将两者合并

下面是一个具有单个属性组合(
简单组合
)和一个具有多个属性组合(
多个组合
)的示例:

require(data.table)

df我会在您想要子集的列上使用
数据.table
,然后生成一个
数据.table
(在运行时)和您感兴趣的组合,然后将两者合并

下面是一个具有单个属性组合(
简单组合
)和一个具有多个属性组合(
多个组合
)的示例:

require(data.table)

df Fyi,
data.table
有自己的expand.grid变体,
CJ
(尽管我不确定两者之间的权衡是什么)。此外,您还可以在
上与合并,而无需设置键:
simple=dt[.(“d”、“e”、“d”、“e”),on=paste0(“att”,1:4),nomatch=0];mult=dt[CJ(att1=“d”,att2=c(“c”,“d”,“e”),att3=“d”,att4=c(“b”,“e”)),on=paste0(“att”,1:4),nomatch=0]
@Frank谢谢-我今天学到了一些关于惊人的data.table包的新知识。仅供参考,
data.table
有自己的expand.grid变体,
CJ
(尽管我不确定两者之间的权衡是什么)。此外,您还可以在上与合并,而无需设置键:
simple=dt[.(“d”、“e”、“d”、“e”),on=paste0(“att”,1:4),nomatch=0];mult=dt[CJ(att1=“d”,att2=c(“c”,“d”,“e”),att3=“d”,att4=c(“b”,“e”)),on=paste0(“att”,1:4),nomatch=0]
@Frank谢谢-今天我学到了一些关于惊人的data.table包的新知识。
require(data.table)
df <- structure(list(att1 = c("e", "a", "c", "a", "d", "e", "a", "d", "b", "a", "c", "a", "b", "e", "e", "c", "d", "d", "a", "e", "b"), 
                 att2 = c("b", "d", "c", "a", "e", "c", "e", "d", "e", "b", "e", "e", "c", "e", "a", "a", "e", "c", "b", "b", "d"), 
                 att3 = c("c", "b", "e", "b", "d", "d", "d", "c", "c", "d", "e", "a", "d", "c", "e", "a", "d", "e", "d", "a", "e"), 
                 att4 = c("c", "a", "b", "a", "e", "c", "a", "a", "b", "a", "a", "e", "c", "d", "b", "e", "b", "d", "d", "b", "e")), 
            .Names = c("att1", "att2", "att3", "att4"), class = "data.frame", row.names = c(NA, -21L))

# Convert to data.table
dt <- data.table(df)
# Set key on the columns used for "subsetting"
setkey(dt, att1, att2, att3, att4)

# Simple subset on a single set of attributes
simple_combinations <- data.table(att1 = "d", att2 = "e", att3 = "d", att4 = "e")
setkey(simple_combinations, att1, att2, att3, att4)
# Merge to generate simple output subset (simple_combinations of att present in dt)
simple_subset <- merge(dt, simple_combinations)

# Complex (multiple) sets of attributes
multiple_combinations <- data.table(expand.grid(att1=c("d"), att2=c("c", "d", "e"),
  att3 = c("d"), att4 = c("b", "e")))
setkey(multiple_combinations, att1, att2, att3, att4)
# Merge to generate  output subset (multiple_combinations of att present in dt)
multiple_subset <- merge(dt, multiple_combinations)