Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/84.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
data.frame变量从哪一行具有常量值_R_Time Series_Dplyr - Fatal编程技术网

data.frame变量从哪一行具有常量值

data.frame变量从哪一行具有常量值,r,time-series,dplyr,R,Time Series,Dplyr,我想从另一个变量开始具有常量值的行计算R中data.frame上一个变量的平均值。我通常使用dplyr来完成这类数据库任务,但我不知道如何完成,下面是一个示例: s<-"no Spc PSize 2 0 6493 2 0 9281 2 12 26183 2 12 36180 2

我想从另一个变量开始具有常量值的行计算R中data.frame上一个变量的平均值。我通常使用dplyr来完成这类数据库任务,但我不知道如何完成,下面是一个示例:

s<-"no Spc PSize
2                0           6493
2                0           9281
2               12          26183
2               12          36180
2               12          37806
2               12          37765
3               12          36015
3               12          26661
3                0          14031
3                0           5564
3                1          17701
3                1          20808
3                1          31511
3                1          44746
3                1          50534
3                1          54858
3                1          58160
3                1          60326"

d<-read.delim(textConnection(s),sep="",header=T)

mean(d[1:10,3])
sd(d[1:10,3])

我可以手工计算,但这不是我的想法…

您可以通过添加一列来检查条目是否与上面的值匹配,然后使用
cumsum
查找计数变化的位置。我将它分组,并计算出您想要的摘要——我还添加了一个输出,其中包含了行,以演示它从何处获取

d %>%
  mutate(
    row = 1:n()
    , isDiff = Spc != lag(Spc, default = Spc[1])
    , whichGroup = cumsum(isDiff)) %>%
  group_by(whichGroup, Spc) %>%
  summarise(mean = mean(PSize)
            , sd = sd(PSize)
            , whichRows = paste(range(row), collapse = ":"))
给出:

  whichGroup   Spc    mean        sd whichRows
       <int> <int>   <dbl>     <dbl>     <chr>
1          0     0  7887.0  1971.414       1:2
2          1    12 33435.0  5486.794       3:8
3          2     0  9797.5  5987.073      9:10
4          3     1 42330.5 16866.591     11:18
其中:

  Spc    mean       sd whichRows
1   1 42330.5 16866.59     11:18
  isLast    mean       sd whichRows
   <lgl>   <dbl>    <dbl>     <chr>
1  FALSE 23597.9 13521.32      1:10
2   TRUE 42330.5 16866.59     11:18
         grp mean_size  sd_size
       (chr)     (dbl)    (dbl)
1 last_group   42330.5 16866.59
2     others   23597.9 13521.32
         grp mean_size  sd_size  rows
       <chr>     <dbl>    <dbl> <chr>
1 last_group   42330.5 16866.59 11:18
2     others   23597.9 13521.32  1:10
根据评论,您似乎想要最后一组而不是其他组,您可以通过以下方法获得:

d %>%
  mutate(
    row = 1:n()
    , isDiff = Spc != lag(Spc, default = Spc[1])
    , whichGroup = cumsum(isDiff)) %>%
  group_by(isLast = whichGroup == max(whichGroup)) %>%
  summarise(mean = mean(PSize)
            , sd = sd(PSize)
            , whichRows = paste(range(row), collapse = ":"))
其中:

  Spc    mean       sd whichRows
1   1 42330.5 16866.59     11:18
  isLast    mean       sd whichRows
   <lgl>   <dbl>    <dbl>     <chr>
1  FALSE 23597.9 13521.32      1:10
2   TRUE 42330.5 16866.59     11:18
         grp mean_size  sd_size
       (chr)     (dbl)    (dbl)
1 last_group   42330.5 16866.59
2     others   23597.9 13521.32
         grp mean_size  sd_size  rows
       <chr>     <dbl>    <dbl> <chr>
1 last_group   42330.5 16866.59 11:18
2     others   23597.9 13521.32  1:10
isLast mean sd显示
1 FALSE 23597.9 13521.32 1:10
2真42330.516866.59 11:18

您可以通过添加一列来检查条目是否与上述值匹配,然后使用
cumsum
查找计数变化的位置。我将它分组,并计算出您想要的摘要——我还添加了一个输出,其中包含了行,以演示它从何处获取

d %>%
  mutate(
    row = 1:n()
    , isDiff = Spc != lag(Spc, default = Spc[1])
    , whichGroup = cumsum(isDiff)) %>%
  group_by(whichGroup, Spc) %>%
  summarise(mean = mean(PSize)
            , sd = sd(PSize)
            , whichRows = paste(range(row), collapse = ":"))
给出:

  whichGroup   Spc    mean        sd whichRows
       <int> <int>   <dbl>     <dbl>     <chr>
1          0     0  7887.0  1971.414       1:2
2          1    12 33435.0  5486.794       3:8
3          2     0  9797.5  5987.073      9:10
4          3     1 42330.5 16866.591     11:18
其中:

  Spc    mean       sd whichRows
1   1 42330.5 16866.59     11:18
  isLast    mean       sd whichRows
   <lgl>   <dbl>    <dbl>     <chr>
1  FALSE 23597.9 13521.32      1:10
2   TRUE 42330.5 16866.59     11:18
         grp mean_size  sd_size
       (chr)     (dbl)    (dbl)
1 last_group   42330.5 16866.59
2     others   23597.9 13521.32
         grp mean_size  sd_size  rows
       <chr>     <dbl>    <dbl> <chr>
1 last_group   42330.5 16866.59 11:18
2     others   23597.9 13521.32  1:10
根据评论,您似乎想要最后一组而不是其他组,您可以通过以下方法获得:

d %>%
  mutate(
    row = 1:n()
    , isDiff = Spc != lag(Spc, default = Spc[1])
    , whichGroup = cumsum(isDiff)) %>%
  group_by(isLast = whichGroup == max(whichGroup)) %>%
  summarise(mean = mean(PSize)
            , sd = sd(PSize)
            , whichRows = paste(range(row), collapse = ":"))
其中:

  Spc    mean       sd whichRows
1   1 42330.5 16866.59     11:18
  isLast    mean       sd whichRows
   <lgl>   <dbl>    <dbl>     <chr>
1  FALSE 23597.9 13521.32      1:10
2   TRUE 42330.5 16866.59     11:18
         grp mean_size  sd_size
       (chr)     (dbl)    (dbl)
1 last_group   42330.5 16866.59
2     others   23597.9 13521.32
         grp mean_size  sd_size  rows
       <chr>     <dbl>    <dbl> <chr>
1 last_group   42330.5 16866.59 11:18
2     others   23597.9 13521.32  1:10
isLast mean sd显示
1 FALSE 23597.9 13521.32 1:10
2真42330.516866.59 11:18

选项1:使用
数据中的
rleid
。表
包:

d %>% 
  group_by(rlid = rleid(Spc)) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize)) %>% 
  slice(n())
给出:

# A tibble: 1 × 3
   rlid mean_size  sd_size
  <int>     <dbl>    <dbl>
1     4   42330.5 16866.59
  mean_size  sd_size
1   42330.5 16866.59
给出:

# A tibble: 1 × 3
   rlid mean_size  sd_size
  <int>     <dbl>    <dbl>
1     4   42330.5 16866.59
  mean_size  sd_size
1   42330.5 16866.59

选项3:如果要计算两组(最后一组和其他组),应使用
group_by
而不是
filter
,并使用
rle
创建一个新的分组向量(
rep_vec
):

startrow <- sum(head(rle(d$Spc)$lengths, -1)) + 1
d %>%
  slice(startrow:n()) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize))
rep_vec <- c(sum(head(rle(d$Spc)$lengths, -1)), tail(rle(d$Spc)$lengths, 1))

d %>%
  group_by(grp = rep(c('others','last_group'), rep_vec)) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize))
如果要包含行,可以将代码更改为:

d %>%
  mutate(rn = row_number()) %>% 
  group_by(grp = rep(c('others','last_group'), rep_vec)) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize), rows = paste0(range(rn), collapse=':'))
其中:

  Spc    mean       sd whichRows
1   1 42330.5 16866.59     11:18
  isLast    mean       sd whichRows
   <lgl>   <dbl>    <dbl>     <chr>
1  FALSE 23597.9 13521.32      1:10
2   TRUE 42330.5 16866.59     11:18
         grp mean_size  sd_size
       (chr)     (dbl)    (dbl)
1 last_group   42330.5 16866.59
2     others   23597.9 13521.32
         grp mean_size  sd_size  rows
       <chr>     <dbl>    <dbl> <chr>
1 last_group   42330.5 16866.59 11:18
2     others   23597.9 13521.32  1:10
grp平均大小sd大小行
最后一组42330.5 16866.59 11:18
2其他23597.9 13521.32 1:10

选项1:使用
数据中的
rleid
。表
包:

d %>% 
  group_by(rlid = rleid(Spc)) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize)) %>% 
  slice(n())
给出:

# A tibble: 1 × 3
   rlid mean_size  sd_size
  <int>     <dbl>    <dbl>
1     4   42330.5 16866.59
  mean_size  sd_size
1   42330.5 16866.59
给出:

# A tibble: 1 × 3
   rlid mean_size  sd_size
  <int>     <dbl>    <dbl>
1     4   42330.5 16866.59
  mean_size  sd_size
1   42330.5 16866.59

选项3:如果要计算两组(最后一组和其他组),应使用
group_by
而不是
filter
,并使用
rle
创建一个新的分组向量(
rep_vec
):

startrow <- sum(head(rle(d$Spc)$lengths, -1)) + 1
d %>%
  slice(startrow:n()) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize))
rep_vec <- c(sum(head(rle(d$Spc)$lengths, -1)), tail(rle(d$Spc)$lengths, 1))

d %>%
  group_by(grp = rep(c('others','last_group'), rep_vec)) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize))
如果要包含行,可以将代码更改为:

d %>%
  mutate(rn = row_number()) %>% 
  group_by(grp = rep(c('others','last_group'), rep_vec)) %>% 
  summarise(mean_size = mean(PSize), sd_size = sd(PSize), rows = paste0(range(rn), collapse=':'))
其中:

  Spc    mean       sd whichRows
1   1 42330.5 16866.59     11:18
  isLast    mean       sd whichRows
   <lgl>   <dbl>    <dbl>     <chr>
1  FALSE 23597.9 13521.32      1:10
2   TRUE 42330.5 16866.59     11:18
         grp mean_size  sd_size
       (chr)     (dbl)    (dbl)
1 last_group   42330.5 16866.59
2     others   23597.9 13521.32
         grp mean_size  sd_size  rows
       <chr>     <dbl>    <dbl> <chr>
1 last_group   42330.5 16866.59 11:18
2     others   23597.9 13521.32  1:10
grp平均大小sd大小行
最后一组42330.5 16866.59 11:18
2其他23597.9 13521.32 1:10

那么您想找到中间向量开始为常数的索引吗?您可以获取向量的
diff()
,并第一次查看它是否不同于零。比如说,

vec <- c(1,2,3,4,5,5,5,6,6,6)
diff(vec)
differences <- rev(diff(vec))

# distance from the end of first non-zero
min.dist <- min(which(differences != 0))

# take difference
length(vec) - min.dist + 1

vec那么您想找到中间向量开始为常数的索引?您可以获取向量的
diff()
,并第一次查看它是否不同于零。比如说,

vec <- c(1,2,3,4,5,5,5,6,6,6)
diff(vec)
differences <- rev(diff(vec))

# distance from the end of first non-zero
min.dist <- min(which(differences != 0))

# take difference
length(vec) - min.dist + 1

vec我想计算两组,最后一组和其他组。@Leosar——请参阅最新编辑,以获得将最后一组与所有其他组进行比较的版本。在我看来,这种方法比@Draglistatus Maximus更灵活one@Leosar我认为马克的方法很好,但在国际海事组织(imo)看来,它并不是更灵活(也不是更少灵活)。我想计算两组,最后一组和其他组。@Leosar——请参阅最新编辑,了解将最后一组与所有其他组进行比较的版本。在我看来,这种方法比@Draglastatus Maximus更灵活one@Leosar我认为马克的方法很好,但在国际海事组织(imo)看来,它并不是更灵活(也不是更少灵活)。