R 如何使用多个变量对多个数据帧进行子集划分_R

R 如何使用多个变量对多个数据帧进行子集划分

R 如何使用多个变量对多个数据帧进行子集划分,r,R,我想根据一系列31个变量对五个数据帧进行子集划分。数据帧存储在一个列表中： long_data_sets <- list(scale_g1, scale_g2, scale_g3, scale_g4, scale_g5) 我想一次根据31个因子变量中的一个对数据帧进行子集划分，以便最终得到5*31个新数据帧我创建了一个用于子集设置的函数，该函数只保留了我需要前进的两列（“方向”和“响应”）：我尝试使用map2（）和5个数据帧的列表以及31个因子名称的列表来应用该函数，但这显然不起作用

我想根据一系列31个变量对五个数据帧进行子集划分。数据帧存储在一个列表中：

long_data_sets <- list(scale_g1, scale_g2, scale_g3, scale_g4, scale_g5)

我想一次根据31个因子变量中的一个对数据帧进行子集划分，以便最终得到5*31个新数据帧

我创建了一个用于子集设置的函数，该函数只保留了我需要前进的两列（“方向”和“响应”）：

我尝试使用map2（）和5个数据帧的列表以及31个因子名称的列表来应用该函数，但这显然不起作用

> speeder_var <- names(scale_g1[53:83])
> map2(long_data_sets, speeder_var, create_speeder_data)
Error: `.x` (5) and `.y` (31) are different lengths

>speeder\u var映射2（长数据集、speeder\u var、创建speeder\u数据）
错误：`.x`（5）和`.y`（31）的长度不同

我能得到的最接近的方法是从我的函数中取出y参数，并将该函数应用于31个因子之一的五个数据帧列表

#Create subsetting function for "speeder_225"
create_speeder_225_data <- function(x){
  df <- subset(x, x$speeder_225 == "Speeder",
               select = c("direction", "response"))
}

#Map function to list of data frames
z_speeder_225 <- map(long_data_sets, create_speeder_225_data)

#Change names of new data frames in list
names(long_data_sets) <- c("g1", "g2", "g3", "g4", "g5")
names(z_speeder_225) <- paste0(names_long_data_sets, "speeder_225")

#Get data frames from list
list2env(z_speeder_225, envir=.GlobalEnv)

#为“Speeder225”创建子集功能
创建“speeder”数据我同意@Mako212-你可能需要重新考虑你想做什么。然而，这里有一些应该起作用的东西
下面的代码将子集列表中的每个数据集。在测试数据中有5个分类变量，每个变量有两个级别。由于otuput仅基于1级（超速
），因此输出将为5 x 5=25个数据集。这是一个列表列表（5 x 5）：
这是一个快速测试，以查看代码是否运行良好，并且没有错误地进行子集设置。下面是一个简化的调用，其中仅包含子集标准的观察数/行数：
> unlist(lapply(cols, function(z){
+     lapply(data_list, function(x){
+         return(x[get(z) == 'speeding', .(nrows = .N)])
+     })
+ }))
nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows 
  113    82    24   112   185    97    63    22   110   193   103    78    35   115   197   110    74 
nrows nrows nrows nrows nrows nrows nrows nrows 
   26   103   194   107    84    25    97   191 

为什么需要155个单独的数据帧？对于很多分析来说，仅使用分组来分离不同的因素更有意义。您可能需要首先进行从宽到长的转换。顺便说一句，如果您展示了一个具体的示例（不需要是您的真实数据）来说明问题，那么将更容易说明从宽到长的转换，等等。例如，在Gautam的回答中，他创建了一个可以运行代码的数据列表。除了这个好例子，这里还有一些其他的指导：谢谢你们的评论。不创建数据集，而只是使用分组并从那里开始，这确实是有意义的。加上@Gautam的建议，这对我很有效。我将确保在将来发布问题时遵循指导原则。
> speeder_var <- names(scale_g1[53:83])
> map2(long_data_sets, speeder_var, create_speeder_data)
Error: `.x` (5) and `.y` (31) are different lengths

#Create subsetting function for "speeder_225"
create_speeder_225_data <- function(x){
  df <- subset(x, x$speeder_225 == "Speeder",
               select = c("direction", "response"))
}

#Map function to list of data frames
z_speeder_225 <- map(long_data_sets, create_speeder_225_data)

#Change names of new data frames in list
names(long_data_sets) <- c("g1", "g2", "g3", "g4", "g5")
names(z_speeder_225) <- paste0(names_long_data_sets, "speeder_225")

#Get data frames from list
list2env(z_speeder_225, envir=.GlobalEnv)

library(data.table)

# Creating some dummy data
k  <- 100
directions <- as.vector(sapply(c('North', 'West', 'South', 'East'), function (z) return(rep(z, k))))
speeding <- as.vector(sapply(c('speeding', 'not-speeding'), function (z) return(rep(z, k))))

# Test data - number_of_observations <= 4*k
createDataTable <- function(number_of_observations = 50){
  dt <- data.table(direction = sample(x = directions, size = number_of_observations, replace = T), 
                   speeder1 = sample(x = speeding, size = number_of_observations, replace = T), 
                   speeder2 = sample(x = speeding, size = number_of_observations, replace = T),
                   speeder3 = sample(x = speeding, size = number_of_observations, replace = T),
                   speeder4 = sample(x = speeding, size = number_of_observations, replace = T),
                   speeder5 = sample(x = speeding, size = number_of_observations, replace = T))
}

data_list <- lapply(X = floor(runif(n = 5, min = 50, max = 4*k)), 
                    FUN = function(z){createDataTable(z)})

# Subset dummy data based on one column at a time and return 
# the number of observations, direction, speeder2 and speeder3 from the subset 
cols <- sapply(1:5, function(z) paste('speeder',z,sep = ""))

ret <- lapply(cols, function(z){
  lapply(data_list, function(x){
    return(x[get(z) == 'speeding', .(nrows = .N, direction, speeder2, speeder3)])
  })
})

> summary(ret)
     Length Class  Mode
[1,] 5      -none- list
[2,] 5      -none- list
[3,] 5      -none- list
[4,] 5      -none- list
[5,] 5      -none- list
> summary(ret[[1]])
     Length Class      Mode
[1,] 4      data.table list
[2,] 4      data.table list
[3,] 4      data.table list
[4,] 4      data.table list
[5,] 4      data.table list

> unlist(lapply(cols, function(z){
+     lapply(data_list, function(x){
+         return(x[get(z) == 'speeding', .(nrows = .N)])
+     })
+ }))
nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows nrows 
  113    82    24   112   185    97    63    22   110   193   103    78    35   115   197   110    74 
nrows nrows nrows nrows nrows nrows nrows nrows 
   26   103   194   107    84    25    97   191