data.frame变量从哪一行具有常量值
我想从另一个变量开始具有常量值的行计算R中data.frame上一个变量的平均值。我通常使用dplyr来完成这类数据库任务,但我不知道如何完成,下面是一个示例:data.frame变量从哪一行具有常量值,r,time-series,dplyr,R,Time Series,Dplyr,我想从另一个变量开始具有常量值的行计算R中data.frame上一个变量的平均值。我通常使用dplyr来完成这类数据库任务,但我不知道如何完成,下面是一个示例: s<-"no Spc PSize 2 0 6493 2 0 9281 2 12 26183 2 12 36180 2
s<-"no Spc PSize
2 0 6493
2 0 9281
2 12 26183
2 12 36180
2 12 37806
2 12 37765
3 12 36015
3 12 26661
3 0 14031
3 0 5564
3 1 17701
3 1 20808
3 1 31511
3 1 44746
3 1 50534
3 1 54858
3 1 58160
3 1 60326"
d<-read.delim(textConnection(s),sep="",header=T)
mean(d[1:10,3])
sd(d[1:10,3])
我可以手工计算,但这不是我的想法…您可以通过添加一列来检查条目是否与上面的值匹配,然后使用
cumsum
查找计数变化的位置。我将它分组,并计算出您想要的摘要——我还添加了一个输出,其中包含了行,以演示它从何处获取
d %>%
mutate(
row = 1:n()
, isDiff = Spc != lag(Spc, default = Spc[1])
, whichGroup = cumsum(isDiff)) %>%
group_by(whichGroup, Spc) %>%
summarise(mean = mean(PSize)
, sd = sd(PSize)
, whichRows = paste(range(row), collapse = ":"))
给出:
whichGroup Spc mean sd whichRows
<int> <int> <dbl> <dbl> <chr>
1 0 0 7887.0 1971.414 1:2
2 1 12 33435.0 5486.794 3:8
3 2 0 9797.5 5987.073 9:10
4 3 1 42330.5 16866.591 11:18
其中:
Spc mean sd whichRows
1 1 42330.5 16866.59 11:18
isLast mean sd whichRows
<lgl> <dbl> <dbl> <chr>
1 FALSE 23597.9 13521.32 1:10
2 TRUE 42330.5 16866.59 11:18
grp mean_size sd_size
(chr) (dbl) (dbl)
1 last_group 42330.5 16866.59
2 others 23597.9 13521.32
grp mean_size sd_size rows
<chr> <dbl> <dbl> <chr>
1 last_group 42330.5 16866.59 11:18
2 others 23597.9 13521.32 1:10
根据评论,您似乎想要最后一组而不是其他组,您可以通过以下方法获得:
d %>%
mutate(
row = 1:n()
, isDiff = Spc != lag(Spc, default = Spc[1])
, whichGroup = cumsum(isDiff)) %>%
group_by(isLast = whichGroup == max(whichGroup)) %>%
summarise(mean = mean(PSize)
, sd = sd(PSize)
, whichRows = paste(range(row), collapse = ":"))
其中:
Spc mean sd whichRows
1 1 42330.5 16866.59 11:18
isLast mean sd whichRows
<lgl> <dbl> <dbl> <chr>
1 FALSE 23597.9 13521.32 1:10
2 TRUE 42330.5 16866.59 11:18
grp mean_size sd_size
(chr) (dbl) (dbl)
1 last_group 42330.5 16866.59
2 others 23597.9 13521.32
grp mean_size sd_size rows
<chr> <dbl> <dbl> <chr>
1 last_group 42330.5 16866.59 11:18
2 others 23597.9 13521.32 1:10
isLast mean sd显示
1 FALSE 23597.9 13521.32 1:10
2真42330.516866.59 11:18
您可以通过添加一列来检查条目是否与上述值匹配,然后使用cumsum
查找计数变化的位置。我将它分组,并计算出您想要的摘要——我还添加了一个输出,其中包含了行,以演示它从何处获取
d %>%
mutate(
row = 1:n()
, isDiff = Spc != lag(Spc, default = Spc[1])
, whichGroup = cumsum(isDiff)) %>%
group_by(whichGroup, Spc) %>%
summarise(mean = mean(PSize)
, sd = sd(PSize)
, whichRows = paste(range(row), collapse = ":"))
给出:
whichGroup Spc mean sd whichRows
<int> <int> <dbl> <dbl> <chr>
1 0 0 7887.0 1971.414 1:2
2 1 12 33435.0 5486.794 3:8
3 2 0 9797.5 5987.073 9:10
4 3 1 42330.5 16866.591 11:18
其中:
Spc mean sd whichRows
1 1 42330.5 16866.59 11:18
isLast mean sd whichRows
<lgl> <dbl> <dbl> <chr>
1 FALSE 23597.9 13521.32 1:10
2 TRUE 42330.5 16866.59 11:18
grp mean_size sd_size
(chr) (dbl) (dbl)
1 last_group 42330.5 16866.59
2 others 23597.9 13521.32
grp mean_size sd_size rows
<chr> <dbl> <dbl> <chr>
1 last_group 42330.5 16866.59 11:18
2 others 23597.9 13521.32 1:10
根据评论,您似乎想要最后一组而不是其他组,您可以通过以下方法获得:
d %>%
mutate(
row = 1:n()
, isDiff = Spc != lag(Spc, default = Spc[1])
, whichGroup = cumsum(isDiff)) %>%
group_by(isLast = whichGroup == max(whichGroup)) %>%
summarise(mean = mean(PSize)
, sd = sd(PSize)
, whichRows = paste(range(row), collapse = ":"))
其中:
Spc mean sd whichRows
1 1 42330.5 16866.59 11:18
isLast mean sd whichRows
<lgl> <dbl> <dbl> <chr>
1 FALSE 23597.9 13521.32 1:10
2 TRUE 42330.5 16866.59 11:18
grp mean_size sd_size
(chr) (dbl) (dbl)
1 last_group 42330.5 16866.59
2 others 23597.9 13521.32
grp mean_size sd_size rows
<chr> <dbl> <dbl> <chr>
1 last_group 42330.5 16866.59 11:18
2 others 23597.9 13521.32 1:10
isLast mean sd显示
1 FALSE 23597.9 13521.32 1:10
2真42330.516866.59 11:18
选项1:使用数据中的rleid
。表包:
d %>%
group_by(rlid = rleid(Spc)) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize)) %>%
slice(n())
给出:
# A tibble: 1 × 3
rlid mean_size sd_size
<int> <dbl> <dbl>
1 4 42330.5 16866.59
mean_size sd_size
1 42330.5 16866.59
给出:
# A tibble: 1 × 3
rlid mean_size sd_size
<int> <dbl> <dbl>
1 4 42330.5 16866.59
mean_size sd_size
1 42330.5 16866.59
选项3:如果要计算两组(最后一组和其他组),应使用group_by
而不是filter
,并使用rle
创建一个新的分组向量(rep_vec
):
startrow <- sum(head(rle(d$Spc)$lengths, -1)) + 1
d %>%
slice(startrow:n()) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize))
rep_vec <- c(sum(head(rle(d$Spc)$lengths, -1)), tail(rle(d$Spc)$lengths, 1))
d %>%
group_by(grp = rep(c('others','last_group'), rep_vec)) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize))
如果要包含行,可以将代码更改为:
d %>%
mutate(rn = row_number()) %>%
group_by(grp = rep(c('others','last_group'), rep_vec)) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize), rows = paste0(range(rn), collapse=':'))
其中:
Spc mean sd whichRows
1 1 42330.5 16866.59 11:18
isLast mean sd whichRows
<lgl> <dbl> <dbl> <chr>
1 FALSE 23597.9 13521.32 1:10
2 TRUE 42330.5 16866.59 11:18
grp mean_size sd_size
(chr) (dbl) (dbl)
1 last_group 42330.5 16866.59
2 others 23597.9 13521.32
grp mean_size sd_size rows
<chr> <dbl> <dbl> <chr>
1 last_group 42330.5 16866.59 11:18
2 others 23597.9 13521.32 1:10
grp平均大小sd大小行
最后一组42330.5 16866.59 11:18
2其他23597.9 13521.32 1:10
选项1:使用数据中的rleid
。表包:
d %>%
group_by(rlid = rleid(Spc)) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize)) %>%
slice(n())
给出:
# A tibble: 1 × 3
rlid mean_size sd_size
<int> <dbl> <dbl>
1 4 42330.5 16866.59
mean_size sd_size
1 42330.5 16866.59
给出:
# A tibble: 1 × 3
rlid mean_size sd_size
<int> <dbl> <dbl>
1 4 42330.5 16866.59
mean_size sd_size
1 42330.5 16866.59
选项3:如果要计算两组(最后一组和其他组),应使用group_by
而不是filter
,并使用rle
创建一个新的分组向量(rep_vec
):
startrow <- sum(head(rle(d$Spc)$lengths, -1)) + 1
d %>%
slice(startrow:n()) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize))
rep_vec <- c(sum(head(rle(d$Spc)$lengths, -1)), tail(rle(d$Spc)$lengths, 1))
d %>%
group_by(grp = rep(c('others','last_group'), rep_vec)) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize))
如果要包含行,可以将代码更改为:
d %>%
mutate(rn = row_number()) %>%
group_by(grp = rep(c('others','last_group'), rep_vec)) %>%
summarise(mean_size = mean(PSize), sd_size = sd(PSize), rows = paste0(range(rn), collapse=':'))
其中:
Spc mean sd whichRows
1 1 42330.5 16866.59 11:18
isLast mean sd whichRows
<lgl> <dbl> <dbl> <chr>
1 FALSE 23597.9 13521.32 1:10
2 TRUE 42330.5 16866.59 11:18
grp mean_size sd_size
(chr) (dbl) (dbl)
1 last_group 42330.5 16866.59
2 others 23597.9 13521.32
grp mean_size sd_size rows
<chr> <dbl> <dbl> <chr>
1 last_group 42330.5 16866.59 11:18
2 others 23597.9 13521.32 1:10
grp平均大小sd大小行
最后一组42330.5 16866.59 11:18
2其他23597.9 13521.32 1:10
那么您想找到中间向量开始为常数的索引吗?您可以获取向量的diff()
,并第一次查看它是否不同于零。比如说,
vec <- c(1,2,3,4,5,5,5,6,6,6)
diff(vec)
differences <- rev(diff(vec))
# distance from the end of first non-zero
min.dist <- min(which(differences != 0))
# take difference
length(vec) - min.dist + 1
vec那么您想找到中间向量开始为常数的索引?您可以获取向量的diff()
,并第一次查看它是否不同于零。比如说,
vec <- c(1,2,3,4,5,5,5,6,6,6)
diff(vec)
differences <- rev(diff(vec))
# distance from the end of first non-zero
min.dist <- min(which(differences != 0))
# take difference
length(vec) - min.dist + 1
vec我想计算两组,最后一组和其他组。@Leosar——请参阅最新编辑,以获得将最后一组与所有其他组进行比较的版本。在我看来,这种方法比@Draglistatus Maximus更灵活one@Leosar我认为马克的方法很好,但在国际海事组织(imo)看来,它并不是更灵活(也不是更少灵活)。我想计算两组,最后一组和其他组。@Leosar——请参阅最新编辑,了解将最后一组与所有其他组进行比较的版本。在我看来,这种方法比@Draglastatus Maximus更灵活one@Leosar我认为马克的方法很好,但在国际海事组织(imo)看来,它并不是更灵活(也不是更少灵活)。