Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:重复该值,直到按组显示新值,仅显示第一个非NA值一次_R_Function_Text_Grouping_Repeat - Fatal编程技术网

R:重复该值,直到按组显示新值,仅显示第一个非NA值一次

R:重复该值,直到按组显示新值,仅显示第一个非NA值一次,r,function,text,grouping,repeat,R,Function,Text,Grouping,Repeat,我希望重复这些值,直到按组显示新值。我有一个功能,我发现网上一段时间,几乎做什么,我正在寻找,但不完全。以下是该函数: repeat.before <- function(x) { ind <- which(!is.na(x)) ind_rep <- ind if (is.na(x[1])) { ind_rep <- c(min(ind), ind) ind <- c(1, ind) } rep(x[ind_rep], t

我希望重复这些值,直到按组显示新值。我有一个功能,我发现网上一段时间,几乎做什么,我正在寻找,但不完全。以下是该函数:

    repeat.before <- function(x) {
  ind <- which(!is.na(x))
  ind_rep <- ind
  if (is.na(x[1])) {
    ind_rep <- c(min(ind), ind)
    ind <- c(1, ind)
  }
  rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}
上述代码将输出以下内容:

    group    location 
    A        New York
    A        New York
    A        New York
    A        New York
    A        New York
    B        Chicago
    B        Chicago
    B        Philly
    B        Philly
这和我想要的非常接近,但不是很接近。这就是我所寻求的输出:

    group    location 
    A        NA
    A        NA
    A        New York
    A        New York
    A        New York
    B        Chicago
    B        Chicago
    B        Philly
    B        Philly

基本上,我不希望“repeat”代码在找到第一个值之前开始工作。在它这样做之前,我希望这些行保持不动。这样做的目的是避免对行进行错误分类,即在上面的示例中,前两个A行不应标记为New York。

一个选项是按“group”分组后的
fill
。使用指定为“向上”或“向下”(默认选项)的
填充
。方向
。这里,我们只需要基于预期输出的“向下”选项

library(dplyr)
library(tidyr)
df1 %>%
  group_by(group) %>%
  fill(location) 
# A tibble: 9 x 2
# Groups:   group [2]
#  group location
#  <chr> <chr>   
#1 A     <NA>
#2 A     <NA>
#3 A     New York
#4 A     New York
#5 A     New York
#6 B     Chicago 
#7 B     Chicago 
#8 B     Philly  
#9 B     Philly  
库(dplyr)
图书馆(tidyr)
df1%>%
分组依据(分组)%>%
填充(位置)
#一个tibble:9x2
#分组:分组[2]
#组位置
#      
#1A
#2A
#3 A纽约
#4 A纽约
#5 A纽约
#芝加哥6b
#芝加哥7 B酒店
#8 B费城
#9 B费城
数据
df1您还可以使用
na.locf
功能使用
zoo

library(zoo)
df1 <-
  structure(list(
    group = c("A", "A", "A", "A", "A", "B", "B", "B",
              "B"),
    location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
                 "Philly", NA)
  ),
  class = "data.frame",
  row.names = c(NA,-9L))

df1$location2 <- na.locf(df1$location, na.rm = F)
df1

  group location location2
1     A     <NA>      <NA>
2     A     <NA>      <NA>
3     A New York  New York
4     A     <NA>  New York
5     A     <NA>  New York
6     B  Chicago   Chicago
7     B     <NA>   Chicago
8     B   Philly    Philly
9     B     <NA>    Philly
图书馆(动物园)
df1baser

变换(df1,
loc2=平均值(df1$位置,
cumsum(!is.na(df1$location)),
FUN=函数(x)x[1]))
#组位置loc2
#1A
#2A
#3 A纽约纽约
#4 A纽约
#5 A纽约
#芝加哥6 B
#芝加哥7 B酒店
#费城8 B
#9 B费城

我不太熟悉dplyr,因为我只在这里和那里使用过它。如果我想将结果分配给一个新列,例如“location_2”,我将如何使用此方法?顺便说一下,谢谢您的快速回复!编辑:我相信这是tidyr,不是dplyr?@Jared不希望前两行有“纽约”填充。我会省略填充的行。@jaredannible这很容易
,df1%%>%变异(location2=location)%%>%groupby(group%%>%fill(location2)
@CTHall是的-如果你把方向改为“down”,它会工作得很好。谢谢你们!
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
 "B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA, 
 "Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
library(zoo)
df1 <-
  structure(list(
    group = c("A", "A", "A", "A", "A", "B", "B", "B",
              "B"),
    location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
                 "Philly", NA)
  ),
  class = "data.frame",
  row.names = c(NA,-9L))

df1$location2 <- na.locf(df1$location, na.rm = F)
df1

  group location location2
1     A     <NA>      <NA>
2     A     <NA>      <NA>
3     A New York  New York
4     A     <NA>  New York
5     A     <NA>  New York
6     B  Chicago   Chicago
7     B     <NA>   Chicago
8     B   Philly    Philly
9     B     <NA>    Philly
transform(df1,
          loc2 = ave(df1$location,
                     cumsum(!is.na(df1$location)),
                     FUN = function(x) x[1]))
#  group location     loc2
#1     A     <NA>     <NA>
#2     A     <NA>     <NA>
#3     A New York New York
#4     A     <NA> New York
#5     A     <NA> New York
#6     B  Chicago  Chicago
#7     B     <NA>  Chicago
#8     B   Philly   Philly
#9     B     <NA>   Philly