R:重复该值,直到按组显示新值,仅显示第一个非NA值一次
我希望重复这些值,直到按组显示新值。我有一个功能,我发现网上一段时间,几乎做什么,我正在寻找,但不完全。以下是该函数:R:重复该值,直到按组显示新值,仅显示第一个非NA值一次,r,function,text,grouping,repeat,R,Function,Text,Grouping,Repeat,我希望重复这些值,直到按组显示新值。我有一个功能,我发现网上一段时间,几乎做什么,我正在寻找,但不完全。以下是该函数: repeat.before <- function(x) { ind <- which(!is.na(x)) ind_rep <- ind if (is.na(x[1])) { ind_rep <- c(min(ind), ind) ind <- c(1, ind) } rep(x[ind_rep], t
repeat.before <- function(x) {
ind <- which(!is.na(x))
ind_rep <- ind
if (is.na(x[1])) {
ind_rep <- c(min(ind), ind)
ind <- c(1, ind)
}
rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}
上述代码将输出以下内容:
group location
A New York
A New York
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
这和我想要的非常接近,但不是很接近。这就是我所寻求的输出:
group location
A NA
A NA
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
基本上,我不希望“repeat”代码在找到第一个值之前开始工作。在它这样做之前,我希望这些行保持不动。这样做的目的是避免对行进行错误分类,即在上面的示例中,前两个A行不应标记为New York。一个选项是按“group”分组后的
fill
。使用指定为“向上”或“向下”(默认选项)的填充。方向。这里,我们只需要基于预期输出的“向下”选项
library(dplyr)
library(tidyr)
df1 %>%
group_by(group) %>%
fill(location)
# A tibble: 9 x 2
# Groups: group [2]
# group location
# <chr> <chr>
#1 A <NA>
#2 A <NA>
#3 A New York
#4 A New York
#5 A New York
#6 B Chicago
#7 B Chicago
#8 B Philly
#9 B Philly
库(dplyr)
图书馆(tidyr)
df1%>%
分组依据(分组)%>%
填充(位置)
#一个tibble:9x2
#分组:分组[2]
#组位置
#
#1A
#2A
#3 A纽约
#4 A纽约
#5 A纽约
#芝加哥6b
#芝加哥7 B酒店
#8 B费城
#9 B费城
数据
df1您还可以使用na.locf
功能使用zoo
包
library(zoo)
df1 <-
structure(list(
group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"),
location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)
),
class = "data.frame",
row.names = c(NA,-9L))
df1$location2 <- na.locf(df1$location, na.rm = F)
df1
group location location2
1 A <NA> <NA>
2 A <NA> <NA>
3 A New York New York
4 A <NA> New York
5 A <NA> New York
6 B Chicago Chicago
7 B <NA> Chicago
8 B Philly Philly
9 B <NA> Philly
图书馆(动物园)
df1baser
变换(df1,
loc2=平均值(df1$位置,
cumsum(!is.na(df1$location)),
FUN=函数(x)x[1]))
#组位置loc2
#1A
#2A
#3 A纽约纽约
#4 A纽约
#5 A纽约
#芝加哥6 B
#芝加哥7 B酒店
#费城8 B
#9 B费城
我不太熟悉dplyr,因为我只在这里和那里使用过它。如果我想将结果分配给一个新列,例如“location_2”,我将如何使用此方法?顺便说一下,谢谢您的快速回复!编辑:我相信这是tidyr,不是dplyr?@Jared不希望前两行有“纽约”填充。我会省略填充的行。@jaredannible这很容易,df1%%>%变异(location2=location)%%>%groupby(group%%>%fill(location2)
@CTHall是的-如果你把方向改为“down”,它会工作得很好。谢谢你们!
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
library(zoo)
df1 <-
structure(list(
group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"),
location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)
),
class = "data.frame",
row.names = c(NA,-9L))
df1$location2 <- na.locf(df1$location, na.rm = F)
df1
group location location2
1 A <NA> <NA>
2 A <NA> <NA>
3 A New York New York
4 A <NA> New York
5 A <NA> New York
6 B Chicago Chicago
7 B <NA> Chicago
8 B Philly Philly
9 B <NA> Philly
transform(df1,
loc2 = ave(df1$location,
cumsum(!is.na(df1$location)),
FUN = function(x) x[1]))
# group location loc2
#1 A <NA> <NA>
#2 A <NA> <NA>
#3 A New York New York
#4 A <NA> New York
#5 A <NA> New York
#6 B Chicago Chicago
#7 B <NA> Chicago
#8 B Philly Philly
#9 B <NA> Philly