R:一个数据的行数总和,基于另一个数据的行特定动态条件
考虑以下数据:R:一个数据的行数总和,基于另一个数据的行特定动态条件,r,dynamic,sum,minimum,rowwise,R,Dynamic,Sum,Minimum,Rowwise,考虑以下数据: Country1 = c("Brazil", "India", "China","China","Brazil") Date1<-as.Date(c("2001-01-21", "2002-04-13","2003-06-19","2006-06-19","2007-06-19")) Name1<-c("B","C","A","A","A") Data1<-data.frame(Country1,Date1,Name1) Name2<-c("B","B"
Country1 = c("Brazil", "India", "China","China","Brazil")
Date1<-as.Date(c("2001-01-21", "2002-04-13","2003-06-19","2006-06-19","2007-06-19"))
Name1<-c("B","C","A","A","A")
Data1<-data.frame(Country1,Date1,Name1)
Name2<-c("B","B","C","C","C","A","A","A")
Quality2<-c("good","good","medium","good","good","bad","good","good")
Country2<-c("China","Brazil","Taiwan","India","India","United States","China","Brazil")
Date2<-as.Date(c("2002-02-21", "1999-03-13","1998-08-19", "1996-09-13","2000-12-12","1998-07-21","2005-03-22","2003-06-19"))
Data2<-data.frame(Name2,Quality2,Country2,Date2)
Country1=c(“巴西”、“印度”、“中国”、“中国”、“巴西”)
Date1我们可以过滤质量2
以保留良好的行,将其与数据1
,按国家2
分组,并计算日期2
所在的行数和最小值
library(dplyr)
Data2 %>%
filter(Quality2 == 'good') %>%
right_join(Data1, by = c('Name2' = 'Name1', 'Country2' = 'Country1')) %>%
group_by(Country2) %>%
summarise(Result = sum(Date2 < Date1),
Date1 = min(Date2[Date2 < Date1]))
# A tibble: 3 x 3
# Country2 Result Date1
# <chr> <int> <date>
#1 Brazil 1 1999-03-13
#2 China 0 NA
#3 India 2 1996-09-13
库(dplyr)
数据2%>%
过滤器(质量2=='good')%>%
右键联接(数据1,by=c('Name2'='Name1','Country2'='Country1'))%>%
组别按(国家2)%>%
总结(结果=总和(日期2<日期1),
Date1=min(Date2[Date2
对于更新的数据,我们可以更改方法并执行以下操作:
Data1 %>%
left_join(Data2, by = c('Name1' = 'Name2', 'Country1' = 'Country2')) %>%
group_by(Country1, Date1) %>%
summarise(Result = sum(Date2 < Date1 & Quality2 == "good"),
Date = min(Date2[Date2 < Date1 & Quality2 == "good"]))
# Country1 Date1 Result Date
# <chr> <date> <int> <date>
#1 Brazil 2001-01-21 1 1999-03-13
#2 China 2003-06-19 0 NA
#3 China 2006-06-19 1 2005-03-22
#4 India 2002-04-13 2 1996-09-13
Data1%>%
左联接(数据2,by=c('Name1'='Name2','Country1'='Country2'))%>%
分组人(国家1,日期1)%>%
总结(结果=总和(日期2<日期1&质量2=“良好”),
Date=min(Date2[Date2
非常感谢您的快速回复。然而,在数据1中,可能存在具有相同“名称1”的多个观测值。在这种情况下,我不确定如何根据代码的输出添加结果列。编辑:我在主帖子的Data1中添加了第四行。很抱歉,我一直在编辑数据,使案例更加复杂。在实际数据中,有多个类别的Data1$Name1,它们具有相同的Data1$Country1。在编辑后的帖子中,我添加了第5行,其中Country1==“巴西”。如前所述,实际数据以千为单位。因此,在许多情况下,重复输入Date1、Country1或Name1。因此,Country1或Date1中的任何条目都不是特定于Name1中的特定条目。数据1的第5行就是一个例子(其中巴西是一个重复条目)。第二个问题保持不变。我们如何从您的代码输出中添加Data1$Result和Data1$Min.Date.Result?非常感谢您的耐心和帮助。@KashifAhmed我不太清楚此编辑在应用答案时有何不同。你试过答案了吗?它返回了什么?您可能想在groupby
中添加另一个组Name1
?就第二个问题而言,它已经出现在答案中,您需要像Data3%left\u join(Data2,by=…。答案的其余部分
,在我的答案中,Min.Date.Result
称为Date
。检查Data3
。
sum(Data2$Name2==as.character(Data1$Name1)[1] & Data2$Country2==as.character(Data1$Country1)[1] & ata2$Quality2=="good" & Data2$Date2 < Data1$Date1[1])
sum(Data2$Name2==as.character(Data1$Name1)[2] & Data2$Country2==as.character(Data1$Country1)[2] & ata2$Quality2=="good" & Data2$Date2 < Data1$Date1[2])
sum(Data2$Name2==as.character(Data1$Name1)[54342] & Data2$Country2==as.character(Data1$Country1)[54342] & ata2$Quality2=="good" & Data2$Date2 < Data1$Date1[54342])
sum(Data2$Name2==as.character(Data1$Name1)[n] & Data2$Country2==as.character(Data1$Country1)[n] & Data2$Quality2=="good" & Data2$Date2 < Data1$Date1[n])
library(dplyr)
Data2 %>%
filter(Quality2 == 'good') %>%
right_join(Data1, by = c('Name2' = 'Name1', 'Country2' = 'Country1')) %>%
group_by(Country2) %>%
summarise(Result = sum(Date2 < Date1),
Date1 = min(Date2[Date2 < Date1]))
# A tibble: 3 x 3
# Country2 Result Date1
# <chr> <int> <date>
#1 Brazil 1 1999-03-13
#2 China 0 NA
#3 India 2 1996-09-13
Data1 %>%
left_join(Data2, by = c('Name1' = 'Name2', 'Country1' = 'Country2')) %>%
group_by(Country1, Date1) %>%
summarise(Result = sum(Date2 < Date1 & Quality2 == "good"),
Date = min(Date2[Date2 < Date1 & Quality2 == "good"]))
# Country1 Date1 Result Date
# <chr> <date> <int> <date>
#1 Brazil 2001-01-21 1 1999-03-13
#2 China 2003-06-19 0 NA
#3 China 2006-06-19 1 2005-03-22
#4 India 2002-04-13 2 1996-09-13