Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 通过索引列表按公差对数据进行分组_R_Match_Grouping - Fatal编程技术网

R 通过索引列表按公差对数据进行分组

R 通过索引列表按公差对数据进行分组,r,match,grouping,R,Match,Grouping,我不知道如何很快解释它。我尽力了: 我有以下示例数据: 数据一般情况下:避免在循环中逐行删除或增加数据帧。R的内存管理意味着每次添加或删除一行时,都会生成数据帧的另一个副本。垃圾收集最终会丢弃数据帧的旧副本,但垃圾会快速累积并降低性能。相反,将逻辑列添加到数据帧,并将提取的行设置为TRUE。所以像这样: Data$extracted <- rep(FALSE,nrow(Data)) 至于你的问题:我得到了一组不同的分组号,但这些分组是相同的 也许有一种更优雅的方法可以做到这一点,但这将使

我不知道如何很快解释它。我尽力了: 我有以下示例数据:


数据一般情况下:避免在循环中逐行删除或增加数据帧。R的内存管理意味着每次添加或删除一行时,都会生成数据帧的另一个副本。垃圾收集最终会丢弃数据帧的旧副本,但垃圾会快速累积并降低性能。相反,将逻辑列添加到数据帧,并将提取的行设置为TRUE。所以像这样:

Data$extracted <- rep(FALSE,nrow(Data))
至于你的问题:我得到了一组不同的分组号,但这些分组是相同的

也许有一种更优雅的方法可以做到这一点,但这将使它完成

# store results in a separate list
res <- list()

group.counter <- 1

# loop until they're all done.
for(idx in Ind$I) {
  # skip this iteration if idx is NA.
  if(is.na(idx)) {
    next
  }

  # dat.rows is a logical vector which shows the rows where 
  # "A" meets the tolerance requirement.
  # specify the tolerance here.
  mytol <- 1
  # the next only works for integer compare.
  # also not covered: what if multiple values of C 
  # match idx? do we loop over each corresponding value of A, 
  # i.e. loop over each value of 'target'?
  target <- Data$A[Data$C == idx]

  # use the magic of vectorized logical compare.
  dat.rows <- 
    ( (Data$A - target) >= -mytol) & 
    ( (Data$A - target) <= mytol) & 
    ( ! Data$extracted)
  # if dat.rows is all false, then nothing met the criteria.
  # skip the rest of the loop
  if( ! any(dat.rows)) {
    next
  }

  # copy the rows to the result list.
  res[[length(res) + 1]] <- data.frame(
    A=Data[dat.rows,"A"],
    B=Data[dat.rows,"B"],
    C=Data[dat.rows,"C"],
    Group=group.counter # this value will be recycled to match length of A, B, C.
  )

  # flag the extraction.
  Data$extracted[dat.rows] <- TRUE
  # increment the group counter
  group.counter <- group.counter + 1
}

# now make a data.frame from the results.
# this is the last step in how we avoid 
#"growing" a data.frame inside a loop.
resData <- do.call(rbind, res)

请注意,这不会给出最佳分组-有用于聚类分析的包。但是,如果这满足了您的需要,那就足够了。再次感谢您,您会推荐哪种软件包?我将从集群软件包开始,并对您的数据进行一次尝试-看看这些技术是否有潜在的改进。还有更多的包装,有各种各样的特色。看这里:和这里:非常感谢帮助!
# store results in a separate list
res <- list()

group.counter <- 1

# loop until they're all done.
for(idx in Ind$I) {
  # skip this iteration if idx is NA.
  if(is.na(idx)) {
    next
  }

  # dat.rows is a logical vector which shows the rows where 
  # "A" meets the tolerance requirement.
  # specify the tolerance here.
  mytol <- 1
  # the next only works for integer compare.
  # also not covered: what if multiple values of C 
  # match idx? do we loop over each corresponding value of A, 
  # i.e. loop over each value of 'target'?
  target <- Data$A[Data$C == idx]

  # use the magic of vectorized logical compare.
  dat.rows <- 
    ( (Data$A - target) >= -mytol) & 
    ( (Data$A - target) <= mytol) & 
    ( ! Data$extracted)
  # if dat.rows is all false, then nothing met the criteria.
  # skip the rest of the loop
  if( ! any(dat.rows)) {
    next
  }

  # copy the rows to the result list.
  res[[length(res) + 1]] <- data.frame(
    A=Data[dat.rows,"A"],
    B=Data[dat.rows,"B"],
    C=Data[dat.rows,"C"],
    Group=group.counter # this value will be recycled to match length of A, B, C.
  )

  # flag the extraction.
  Data$extracted[dat.rows] <- TRUE
  # increment the group counter
  group.counter <- group.counter + 1
}

# now make a data.frame from the results.
# this is the last step in how we avoid 
#"growing" a data.frame inside a loop.
resData <- do.call(rbind, res)