基于规则合并R中的两行数据_R_Dplyr

基于规则合并R中的两行数据

基于规则合并R中的两行数据,r,dplyr,R,Dplyr,我使用bind_行合并了两个数据帧。我有一种情况，我有两行数据，例如： Page Path Page Title Byline Pageviews /facilities/when-lighting-strikes NA NA 668 /facilities/when-lighting-strikes When Lighting Stri

我使用bind_行合并了两个数据帧。我有一种情况，我有两行数据，例如：

Page Path                           Page Title             Byline      Pageviews 
/facilities/when-lighting-strikes      NA                    NA           668
/facilities/when-lighting-strikes   When Lighting Strikes  Tom Jones       NA

当我有这些类型的重复页面路径时，我希望合并相同的页面路径，删除第一行中的两个NA，保留页面标题（当灯光照射时）和署名（Tom Jones），然后保留第一行中668的页面浏览结果。不知何故，我似乎需要

识别重复页面路径的步骤

看看是否有不同的标题和署名；删除NAs

保留包含页面浏览结果的行；删除NA行

有没有办法在R dplyr中实现这一点？还是有更好的方法？

一个简单的解决方案：

library(dplyr)

df %>% group_by(PagePath) %>% summarise_each(funs(na.omit))
# Source: local data frame [1 x 4]
# 
#                            PagePath             PageTitle    Byline Pageviews
#                              (fctr)                (fctr)    (fctr)     (int)
# 1 /facilities/when-lighting-strikes When Lighting Strikes Tom Jones       668

如果您的数据更复杂，您可能需要更稳健的方法

资料

df在for循环中使用replace函数
对于（唯一的i（df$Page_路径））{
df$Pageviews[df$Page_Path==i]另一种方法（类似于以前使用dplyr的解决方案）是：
  df %>% group_by(PagePath) %>% 
  dplyr::summarize(PageTitle = paste(na.omit(PageTitle)),
                   Byline = paste(na.omit(Byline)),
                   Pageviews =paste(na.omit(Pageviews)))

这里有一个使用data.table
和complete.cases
的选项。我们将“data.frame”转换为“data.table”（setDT（df）
），按“路径路径”分组，循环通过数据集的列（lappy（.SD，…
）使用complete.cases
删除NA元素。complete.cases
返回一个逻辑向量
，可用于子集设置。根据，complete.cases
使用速度比NA.ommit
快得多，并与数据表
相结合，将提高效率
library(data.table)
setDT(df)[, lapply(.SD, function(x) x[complete.cases(x)]), by = PagePath]
#                     PagePath             PageTitle    Byline Pageviews
#1: /facilities/when-lighting-strikes When Lighting Strikes Tom Jones       668

数据
df使用fill的替代方法。使用tidyverse
1.3.0+和dplyr
0.8.5+，可以使用fill来填充缺失的值
有关更多信息，请参见此
数据谢谢Alistaire
df <- structure(list(PagePath = structure(c(1L, 1L), .Label = "/facilities/when-lighting-strikes", class = "factor"), 
        PageTitle = structure(c(NA, 1L), .Label = "When Lighting Strikes", class = "factor"), 
        Byline = structure(c(NA, 1L), .Label = "Tom Jones", class = "factor"), 
        Pageviews = c(668L, NA)), .Names = c("PagePath", "PageTitle", 
    "Byline", "Pageviews"), class = "data.frame", row.names = c(NA, 
    -2L))

# A tibble: 2 x 4
# Groups:   PagePath [1]
  PagePath                          PageTitle             Byline    Pageviews
  <fct>                             <fct>                 <fct>         <int>
1 /facilities/when-lighting-strikes NA                    NA              668
2 /facilities/when-lighting-strikes When Lighting Strikes Tom Jones        NA

这给了你
# A tibble: 2 x 4
# Groups:   PagePath [1]
  PagePath                          PageTitle             Byline    Pageviews
  <fct>                             <fct>                 <fct>         <int>
1 /facilities/when-lighting-strikes When Lighting Strikes NA              668
2 /facilities/when-lighting-strikes When Lighting Strikes Tom Jones        NA

#一个tible:2 x 4
#组：页面路径[1]
PagePath PageTitle署名页面视图
1/设施/雷击时雷击时NA 668
2/设施/当闪电击中Tom Jones NA时

清理完所有NAs后，您可以使用distinct或rank来获取最终的摘要数据帧。
这里有一个相关的问题和回答：谢谢。我尝试了以下方法，但当我尝试查看时，出现了一个错误：只需一个值。我使用dplyr:：rename（byline=dimension2）%>%dplyr:：rename（Site=profileName）尝试了以下方法%>%分组依据（pagePath）%>%dplyr:：summary（pageTitle=paste（na.omit（pageTitle）），byline=paste（na.omit（byline）），pageviews=paste（na.omit（pageviews）））。我应该提到这是Google Analytics数据。你能提供一些示例数据吗？如果我们能够重现错误，解决这个问题会更容易。谢谢。我试图实现，但最终得到了所有NA页面视图。我应该注意到我的数据包含多行。我收到一条关于当我尝试使用summary_each（funs（na.omit））时，“期望一个值”。如果您发布一个具有代表性的数据子集，我可能会提供帮助。如果只是您已经正确填写了一些行，您可以使用unique
折叠：summary_each（funs（unique（na.omit）（））
。否则，如果要为每个组创建多个结果行，则可能需要使用另一个变量进行分组。我在原始问题中发布了一个更具代表性的数据示例，并突出显示了两行。因此，例如，我希望在灯光亮起时合并两个相同的页面路径/设施/管理操作员。在合并过程中，我希望保留第二行的页面标题和署名，以及第一行的页面视图。不要发布数据的图像，因为其他人无法导入它们。不过，看起来是您最初没有显示的其他列导致了问题；上面的unique方法应该可以奏效E
df <- structure(list(PagePath = structure(c(1L, 1L), 
 .Label = "/facilities/when-lighting-strikes", class = "factor"),   
    PageTitle = structure(c(NA, 1L), .Label = "When Lighting Strikes", class = "factor"), 
    Byline = structure(c(NA, 1L), .Label = "Tom Jones", class = "factor"), 
    Pageviews = c(668L, NA)), .Names = c("PagePath", "PageTitle", 
"Byline", "Pageviews"), class = "data.frame", row.names = c(NA, 
-2L))

df <- structure(list(PagePath = structure(c(1L, 1L), .Label = "/facilities/when-lighting-strikes", class = "factor"), 
        PageTitle = structure(c(NA, 1L), .Label = "When Lighting Strikes", class = "factor"), 
        Byline = structure(c(NA, 1L), .Label = "Tom Jones", class = "factor"), 
        Pageviews = c(668L, NA)), .Names = c("PagePath", "PageTitle", 
    "Byline", "Pageviews"), class = "data.frame", row.names = c(NA, 
    -2L))

# A tibble: 2 x 4
# Groups:   PagePath [1]
  PagePath                          PageTitle             Byline    Pageviews
  <fct>                             <fct>                 <fct>         <int>
1 /facilities/when-lighting-strikes NA                    NA              668
2 /facilities/when-lighting-strikes When Lighting Strikes Tom Jones        NA

df.new <- df %>% group_by(PagePath) 
             %>% fill(PageTitle, .direction = "updown")

# A tibble: 2 x 4
# Groups:   PagePath [1]
  PagePath                          PageTitle             Byline    Pageviews
  <fct>                             <fct>                 <fct>         <int>
1 /facilities/when-lighting-strikes When Lighting Strikes NA              668
2 /facilities/when-lighting-strikes When Lighting Strikes Tom Jones        NA