R 拆分数据框并映射到列表_R

R 拆分数据框并映射到列表

R 拆分数据框并映射到列表,r,R,我有一些数据如下所示： library(sweep) data <- bike_sales data$group <- sample(1:4, 15644, replace = TRUE) data %>% split(.$group) 根据@antoine sac提出的提出阈值参数的建议，我将为每组推荐一个参数列表。每个组都有几段元数据：对于没有下限的情况，有下限或-Inf；如果没有上限，有上限或Inf；以及是否采样而不是筛选。如果你正在采样，你只需要这样做，

我有一些数据如下所示：

library(sweep)

data <- bike_sales

data$group <- sample(1:4, 15644, replace = TRUE)


data %>% 
  split(.$group)

根据@antoine sac提出的提出阈值参数的建议，我将为每组推荐一个参数列表。每个组都有几段元数据：对于没有下限的情况，有下限或-Inf；如果没有上限，有上限或Inf；以及是否采样而不是筛选。如果你正在采样，你只需要这样做，而不是过滤

图书馆弹琴图书馆咕噜声图书馆扫描第1248集

数据根据@antoine sac提出的阈值参数建议，我建议为每组提供一个参数列表。每个组都有几段元数据：对于没有下限的情况，有下限或-Inf；如果没有上限，有上限或Inf；以及是否采样而不是筛选。如果你正在采样，你只需要这样做，而不是过滤

图书馆弹琴图书馆咕噜声图书馆扫描第1248集数据使用data.table包而不拆分数据

library(data.table)
setDT(data, key = "group")

fun <- function(x, grp, df) {
  if(grp == 1) df[x < 1500] else
    if(grp == 2) df[sample(nrow(df), 1)] else       # sample one row
      if(grp == 3) df[between(x, 3000, 5000)] else
        if(grp == 4) df[x > 7000]
}

data[, fun(price, .GRP, .SD), group]

使用data.table包而不拆分数据

library(data.table)
setDT(data, key = "group")

fun <- function(x, grp, df) {
  if(grp == 1) df[x < 1500] else
    if(grp == 2) df[sample(nrow(df), 1)] else       # sample one row
      if(grp == 3) df[between(x, 3000, 5000)] else
        if(grp == 4) df[x > 7000]
}

data[, fun(price, .GRP, .SD), group]

考虑基R，也不使用transform、merge和subset进行拆分或映射。具体地说，合并到单独的数据帧，用于下/上范围指定，以便以后进行过滤。但对于特殊的第2组采样，需要使用row.names的grp2_样本作为所需对象：

甚至还有data.table替代解决方案：

library(data.table)
...
grp2_sample <- sample(rownames(bike_sales[bike_sales$pick == 2,]), 5)    # SAMPLE OF 5

sub_dt <- setDT(bike_sales)[, rn := row.names(bike_sales)][
                            data.table(group = c(1,3,4),
                                       lower = c(-Inf, 3000, 7000),
                                       upper = c(1500, 5000, Inf)), 
                            on="group", 
                            `:=`(lower=i.lower, upper=i.upper)
                           ][(price >= lower & price <= upper) | (rn %in% grp2_sample),]

甚至还有data.table替代解决方案：

library(data.table)
...
grp2_sample <- sample(rownames(bike_sales[bike_sales$pick == 2,]), 5)    # SAMPLE OF 5

sub_dt <- setDT(bike_sales)[, rn := row.names(bike_sales)][
                            data.table(group = c(1,3,4),
                                       lower = c(-Inf, 3000, 7000),
                                       upper = c(1500, 5000, Inf)), 
                            on="group", 
                            `:=`(lower=i.lower, upper=i.upper)
                           ][(price >= lower & price <= upper) | (rn %in% grp2_sample),]

您没有尝试在列表的所有元素上应用相同的函数，因此无论是否使用map，都没有非常优雅的方法来实现。为了避免丑陋的if/else或开关，我将使用purr:：map2并将阈值列表作为第二个参数传递。您并没有试图对列表的所有元素应用相同的函数，因此无论是否使用map，都没有非常优雅的方法来实现。为了避免丑陋的if/else或开关，我将使用purrr:：map2并将阈值列表作为第二个参数传递。

library(data.table)
...
grp2_sample <- sample(rownames(bike_sales[bike_sales$pick == 2,]), 5)    # SAMPLE OF 5

sub_dt <- setDT(bike_sales)[, rn := row.names(bike_sales)][
                            data.table(group = c(1,3,4),
                                       lower = c(-Inf, 3000, 7000),
                                       upper = c(1500, 5000, Inf)), 
                            on="group", 
                            `:=`(lower=i.lower, upper=i.upper)
                           ][(price >= lower & price <= upper) | (rn %in% grp2_sample),]