R 为每个连续序列创建组号_R_Dataframe_Sequence

R 为每个连续序列创建组号

r dataframe

R 为每个连续序列创建组号,r,dataframe,sequence,R,Dataframe,Sequence,我有下面的data.frame。我想添加一列“g”，它根据列h\u no中的连续序列对数据进行分类。也就是说，如最后一列“g”所示，h_no1、2、3、4的第一个序列是组1，h_no（1到7）的第二个序列是组2，依此类推 h_no h_freq h_freqsq g 1 0.09091 0.008264628 1 2 0.00000 0.000000000 1 3 0.04545 0.002065702 1 4 0.00000 0.000000000 1

我有下面的data.frame。我想添加一列“g”，它根据列

h\u no

中的连续序列对数据进行分类。也就是说，如最后一列“g”所示，h_no

1、2、3、4的第一个序列是组1，h_no（1到7）的第二个序列是组2，依此类推
h_no   h_freq    h_freqsq g
1     0.09091 0.008264628 1
2     0.00000 0.000000000 1
3     0.04545 0.002065702 1
4     0.00000 0.000000000 1  
1     0.13636 0.018594050 2
2     0.00000 0.000000000 2
3     0.00000 0.000000000 2
4     0.04545 0.002065702 2
5     0.31818 0.101238512 2
6     0.00000 0.000000000 2
7     0.50000 0.250000000 2 
1     0.13636 0.018594050 3 
2     0.09091 0.008264628 3
3     0.40909 0.167354628 3
4     0.04545 0.002065702 3

您可以使用各种技术向数据中添加列。下面的引号来自相关帮助文本的“详细信息”部分，[.data.frame

数据帧可以以多种模式进行索引。当[
和[
与单个向量索引（x[i]
或x[[i]]
一起使用时，它们将数据帧作为列表进行索引
当[
和[
与两个索引（x[i，j]
和x[[i，j]]
）一起使用时，它们就像为矩阵编制索引一样
除了Roman的答案之外，类似这样的东西可能更简单。请注意，我没有测试它，因为我现在没有访问R的权限
# Note that I use a global variable here
# normally not advisable, but I liked the
# use here to make the code shorter
index <<- 0
new_column = sapply(df$h_no, function(x) {
  if(x == 1) index = index + 1
  return(index)
})

#注意我在这里使用了一个全局变量
#通常不可取，但我喜欢
#使用此处可以缩短代码
索引如果我正确理解了这个问题，您希望检测h\u no
何时不增加，然后增加类
（我将介绍如何解决这个问题，最后有一个独立的函数。）
工作
我们目前只关心h_no
列，因此我们可以从数据帧中提取该列：
> h_no <- data$h_no

一旦我们有了这些，就很容易找到那些不积极的因素：
> nonpos <- d.h_no <= 0
> nonpos
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
[13] FALSE FALSE

但是，有两个问题：数字太小了一个；我们缺少第一个元素（第一个类中应该有四个）
第一个问题简单地解决了：1+cumsum（nonpos）
。第二个问题只需要在向量的前面添加一个1
，因为第一个元素总是在类1
中：
 > classes <- c(1, 1 + cumsum(nonpos))
 > classes
  [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3

而data\u w\u classes
现在包含结果
最终结果
我们可以将这些行压缩在一起，并将其全部打包成一个函数，以使其更易于使用：
classify <- function(data) {
   cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}

您可以使用以下任一函数：
> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column

>分类容易：您的数据帧是
b <- A[,1]
b <- b==1
b <- cumsum(b)

bData.frame[，'h_new_column']我认为使用“cbind”是在R中向数据帧添加列的最简单方法。下面是一个示例：
    myDf = data.frame(index=seq(1,10,1), Val=seq(1,10,1))
    newCol= seq(2,20,2)
    myDf = cbind(myDf,newCol)

基于识别组数（x
在mappy
中）及其长度（y
在mappy
中）的方法
mytbdata.table
函数rleid
对于类似的事情非常方便。我们减去序列1:nrow（data）
将连续序列转换为常量，然后使用rleid
创建组ID：
data$g = data.table::rleid(data$h_no - 1:nrow(data))

最后两种添加列的方法有什么区别？@huon dbaupp带有逗号的方法是显式的，也适用于矩阵，而最后一种方法仅适用于data.frames。如果没有提供逗号，R假设您指的是列。又好又短。我只需更改最后一个元素，而不是cumsum（b）->b
结果将直接作为列添加到原始数据帧中，类似于a$groupscumsum（b）
会给你一个长度为3的向量，还是我遗漏了什么？@RomanLuštrik，看看哪一个解释了在这种情况下cumsum是如何工作的。@RomanLuštrik，这个解决方案可以在一行中很好地重写。使用你的.df
数据，你可以简单地执行你的.df$group=cumsum（你的.df[，1]=1）获取新的组列。我喜欢使用全局变量的hack。因此，Cish.：PSee也是
 > classes <- c(1, 1 + cumsum(nonpos))
 > classes
  [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3

 > data_w_classes <- cbind(data, class=classes)

classify <- function(data) {
   cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}

classify <- function(data) {
   cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0))))
}

> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column

b <- A[,1]
b <- b==1
b <- cumsum(b)

Data.frame[,'h_new_column'] <- as.integer(Data.frame[,'h_no'], breaks=c(1, 4, 7))

    myDf = data.frame(index=seq(1,10,1), Val=seq(1,10,1))
    newCol= seq(2,20,2)
    myDf = cbind(myDf,newCol)

mytb<-read.table(text="h_no  h_freq  h_freqsq group
1     0.09091 0.008264628 1
2     0.00000 0.000000000 1
3     0.04545 0.002065702 1
4     0.00000 0.000000000 1  
1     0.13636 0.018594050 2
2     0.00000 0.000000000 2
3     0.00000 0.000000000 2
4     0.04545 0.002065702 2
5     0.31818 0.101238512 2
6     0.00000 0.000000000 2
7     0.50000 0.250000000 2 
1     0.13636 0.018594050 3 
2     0.09091 0.008264628 3
3     0.40909 0.167354628 3
4     0.04545 0.002065702 3", header=T, stringsAsFactors=F)
mytb$group<-NULL

positionsof1s<-grep(1,mytb$h_no)

mytb$newgroup<-unlist(mapply(function(x,y) 
  rep(x,y),                      # repeat x number y times
  x= 1:length(positionsof1s),    # x is 1 to number of nth group = g1:g3
  y= c( diff(positionsof1s),     # y is number of repeats of groups g1 to penultimate (g2) = 4, 7
        nrow(mytb)-              # this line and the following gives number of repeat for last group (g3)
          (positionsof1s[length(positionsof1s )]-1 )  # number of rows - position of penultimate group (g2) 
      ) ) )
mytb

data$g = data.table::rleid(data$h_no - 1:nrow(data))