根据r中的列值将两行合并为一行

根据r中的列值将两行合并为一行,r,R,请忽略此部分,请看下面的@START HERE 我正在尝试合并以下两行: 像这样排成一行: 以下是创建数据集的代码: dataset <- data.frame(Environment=c("PRODUCTION","PRODUCTION"), Green=c("Yes","No"), Red=c("No","Yes"), Completed=c("Yes",

请忽略此部分,请看下面的@START HERE

我正在尝试合并以下两行:

像这样排成一行:

以下是创建数据集的代码:

dataset <- data.frame(Environment=c("PRODUCTION","PRODUCTION"),
                      Green=c("Yes","No"),
                      Red=c("No","Yes"),
                      Completed=c("Yes","Yes"))
任何帮助都将不胜感激

从这里开始-问题更新

我意识到我的问题并不是我试图解决的问题的反映。让我再试一次。以下是数据集:

我希望它是这样的:

可能还有很多其他的专栏。但是,我只想做的是,对于相同的ID,是否有相同的环境组合它们并返回Yes,如果有,则返回Yes,否则返回默认值。我希望我用了更好的措辞

以下是新的数据集:

dataset <- data.frame(ID=c(15,15,15,16,16,16,16),Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                                                               "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "Yes", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))
基于@p.Routh代码,我认为我们离目标更近了一步。我修改了数据集,以显示静态签名将破坏代码:

dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))
有了这些,我想说的是:

@p.Routh修改后的代码给出错误输出:

df <- dataset%>%group_by(ID,Environment)%>%
  mutate(total = n())%>%  #this counter acts as the condition you need
  unite(signature,Green,Red,White,Black,Completed,sep = ":")%>% #combines the columns into one column
  mutate(dummy = "Yes:Yes:Yes:Yes:Yes")%>% #just a dummy column to faciliate in specifying the condition
  mutate(new_val = ifelse(total>1,dummy,signature))%>% #this is the condition
  select(-signature:-dummy)%>%
  separate(new_val, c("Green","Red","White","Black","Completed"),":") #restores original output
unique(df)
试试这个,使用dplyr和zoo

第一种方法

dataset[dataset=='No']=NA  
dataset%>%group_by(Environment)%>%mutate_each(funs(na.locf))%>%filter(row_number()==n())

  Environment  Green    Red Completed
       <fctr> <fctr> <fctr>    <fctr>
1  PRODUCTION    Yes    Yes       Yes
使用dplyr的解决方案。关键是为除环境外的所有列指定因子级别。然后,总结最小值的列。mutate_at和summary_at可以有效地完成此任务

# Load package
library(dplyr)

# Process the data
dataset2 <- dataset %>%
  # Set factor level to all columns except Environment
  mutate_at(vars(-Environment), factor, levels = c("Yes", "No"), ordered = TRUE) %>%
  group_by(Environment) %>%
  summarise_all(funs(min(.)))
在BaseR中,可以像这样使用聚合

aggregate(dataset[-1], dataset["Environment"], function(x) max(as.character(x)))
返回

  Environment Green Red Completed
1  PRODUCTION   Yes Yes       Yes
看来这个问题在我回答后变了。然而,对我的原始代码进行一个小小的修改,就可以产生所需的输出,并对行进行一些重新排列

aggregate(dataset[-(1:2)], dataset[c("Environment", "ID")], 
          function(x) max(as.character(x)))

请注意,这假定字符的顺序是成功按字典顺序紧跟失败。如果相反,则可以取最小值。其次,在这种情况下,使用数字代码比使用文本更容易。第二种解决方案是将文本转换为数字,以执行上述操作。

我希望还不算太晚。我的解决方案使用dplyr和tidyr


感谢@p.Routh、@Wen和@eipi10。我采纳了你的所有想法,并提出了能够在我的大型数据集上实际工作的代码。以下是上面发布的数据集和有效的代码:

#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)
#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)

谢谢大家。

谢谢@p.Routh、@Wen和@eipi10。我采纳了你的所有想法,并提出了能够在我的大型数据集上实际工作的代码。以下是上面发布的数据集和有效的代码:

#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)
#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)

我想你可以这样做:dataset%>%group\u byEnvironment%>%Summary\u allfunsmaxas.character.@eipi10谢谢~所以我总是对新事物感到惊讶@eipi和@Wen这似乎在测试数据集上起作用。。我试过你的方法@Wen。然而,我并没有每次都使用mutate_。谢谢各位,我会在几分钟后告诉你们我们是否需要包括一个条件来检查环境是否有多个值@LeeS@P.Routh这是正确的。我意识到我的问题不够。解决方案基于一个环境值。所以我一直在努力修改这个问题,请看上面。请看我的解决方案works@P.Routh.. 我看到了。为了不看屏幕,我不得不散散步。我现在正在测试。谢谢@P.Routh和大家else@P.Routh..you你一点也不晚。我不得不散散步。。让我来测试一下。。它适用于我创建的示例数据帧。代码不错。但是,我认为创建静态签名会破坏您的代码。@LeeS我同意。代码可以更好。我只是想有创意。很抱歉,它与您的原始版本不兼容data@P.Routh..I感谢你的努力。给了我一些思考的东西。我看了这一点,尝试了不同的方法。。无法确定如何基于特定列合并具有不同值的两行。
#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)
dataset <- data.frame(ID=c(15,15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION","PRODUCTION", "TRAINING",
                                    "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No", "No","No","Yes", "No", "No"),
                      White=c("No","No", "Yes","Yes","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","No","Yes","Yes", "No", "No"))

dataset%>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.))))