R使用简单代码进行排序
有人能帮我解决以下问题吗 数据集示例:R使用简单代码进行排序,r,sorting,date,conditional-statements,R,Sorting,Date,Conditional Statements,有人能帮我解决以下问题吗 数据集示例: Ticketid Creation_Date Location Person a1 01-02-2015 A John b1 03-02-2015 B Jack c1 03-02-2015 C Mint a1 03-02-2015 D Manu d1 03-02-2015 A Somu e1 03-02-2015 A John b1 11-02-2015 B Jack a1 11-0
Ticketid Creation_Date Location Person
a1 01-02-2015 A John
b1 03-02-2015 B Jack
c1 03-02-2015 C Mint
a1 03-02-2015 D Manu
d1 03-02-2015 A Somu
e1 03-02-2015 A John
b1 11-02-2015 B Jack
a1 11-02-2015 C Mint
b1 14-02-2015 F John
b1 27-02-2015 E John
问题:
1.删除Ticketid的重复项,并以这种方式进行过滤
->创建日期少于首次发生日期的7天。
例如:对于票证id a1,有3个创建日期,即2015年2月1日、2015年2月3日、2015年2月11日,我希望一个新列具有重复标志,并将本例中第一次出现的时间(2015年2月1日)标记为是。因为第二次发生在第一次事件发生后的7天内
2.按照上述逻辑
->I want to filter by Location(Ticketid,creationdate)
->I want to filter by Person(Ticketid,creationdate)
代码:
t、 首先我理解你的问题:
df <- data.frame(Ticketid = c('a1','b1','c1','a1','d1','e1','b1','a1','b1','b1'),
Creation_Date = as.Date(c('01-02-2015','03-02-2015','03-02-2015','03-02-2015','03-02-2015','03-02-2015','11-02-2015','11-02-2015','14-02-2015','27-02-2015'), format = '%d-%m-%Y'),
Location = c('A','B','C','D','A','A','B','C','F','E'),
Person = c('John','Jack','Mint','Manu', 'Somu','John', 'Jack', 'Mint','John','John') )
Ticketid Creation_Date Location Person
1 a1 2015-02-01 A John
2 b1 2015-02-03 B Jack
3 c1 2015-02-03 C Mint
4 a1 2015-02-03 D Manu
5 d1 2015-02-03 A Somu
6 e1 2015-02-03 A John
7 b1 2015-02-11 B Jack
8 a1 2015-02-11 C Mint
9 b1 2015-02-14 F John
10 b1 2015-02-27 E John
library(dplyr)
first_creation <- df %>%
select(Ticketid,First_Date = Creation_Date) %>%
group_by(Ticketid) %>%
slice(1) %>%
ungroup()
df2 <- merge(first_creation,df, all.y = T, by = 'Ticketid')
df3 <- df2 %>% mutate(time_diff = Creation_Date - First_Date)
df_flagged <- df3 %>% group_by(Ticketid) %>% mutate(Within_7 = ifelse(time_diff > 7 | time_diff == 0, 'NO','YES'))
Source: local data frame [10 x 7]
Groups: Ticketid
Ticketid First_Date Creation_Date Location Person time_diff Within_7
1 a1 2015-02-01 2015-02-01 A John 0 days NO
2 a1 2015-02-01 2015-02-03 D Manu 2 days YES
3 a1 2015-02-01 2015-02-11 C Mint 10 days NO
4 b1 2015-02-03 2015-02-03 B Jack 0 days NO
5 b1 2015-02-03 2015-02-11 B Jack 8 days NO
6 b1 2015-02-03 2015-02-14 F John 11 days NO
7 b1 2015-02-03 2015-02-27 E John 24 days NO
8 c1 2015-02-03 2015-02-03 C Mint 0 days NO
9 d1 2015-02-03 2015-02-03 A Somu 0 days NO
10 e1 2015-02-03 2015-02-03 A John 0 days NO
我已经编辑了你的Q,所以你的数据的格式更好-请阅读StackOverflow的格式语言说明。谢谢,先生。如果有人能点灯,那就太好了。你能发布预期的输出吗?您是否只检查每个分组变量的第一个和第二个日期之间小于7的条件?您好,我已更新了问题中的查询。谢谢您,先生,请查找更新的查询。这确实部分解决了我的问题。我在处理前几行时出现以下错误。错误:找不到函数%>%您需要通过键入install来安装程序包'dplyr'。程序包'dplyr'col.slade:当在庞大的数据集中使用此代码时,我收到以下错误消息,这将导致NA作为time_diff和in_7变量的输出。我将该文件读取为csv,然后计算它。警告消息:在Ops.factorc25L、4L、7L、22L、59L、3L、40L、37L、34L、25L、48L中,:“-”对工厂没有意义在我安装了librarydate和librarytimeDate之后,代码工作了,但我发现了一个新的错误,时差很大,比如86400。示例:资产首个日期序列创建日期分支工程师姓名时间内差异7 10k 03-03-2015 84901926444 04-03-2015 ND Mani 86400编号
Answer1:**Location RepeatFlag Model1 Model2 Model3**
Answer2:**Location Person RepeatFlag Model1 Model2 Model3**
Answer3:**Location PartsUsed RepeatFlag Model1 Model2 Model3**
df <- data.frame(Ticketid = c('a1','b1','c1','a1','d1','e1','b1','a1','b1','b1'),
Creation_Date = as.Date(c('01-02-2015','03-02-2015','03-02-2015','03-02-2015','03-02-2015','03-02-2015','11-02-2015','11-02-2015','14-02-2015','27-02-2015'), format = '%d-%m-%Y'),
Location = c('A','B','C','D','A','A','B','C','F','E'),
Person = c('John','Jack','Mint','Manu', 'Somu','John', 'Jack', 'Mint','John','John') )
Ticketid Creation_Date Location Person
1 a1 2015-02-01 A John
2 b1 2015-02-03 B Jack
3 c1 2015-02-03 C Mint
4 a1 2015-02-03 D Manu
5 d1 2015-02-03 A Somu
6 e1 2015-02-03 A John
7 b1 2015-02-11 B Jack
8 a1 2015-02-11 C Mint
9 b1 2015-02-14 F John
10 b1 2015-02-27 E John
library(dplyr)
first_creation <- df %>%
select(Ticketid,First_Date = Creation_Date) %>%
group_by(Ticketid) %>%
slice(1) %>%
ungroup()
df2 <- merge(first_creation,df, all.y = T, by = 'Ticketid')
df3 <- df2 %>% mutate(time_diff = Creation_Date - First_Date)
df_flagged <- df3 %>% group_by(Ticketid) %>% mutate(Within_7 = ifelse(time_diff > 7 | time_diff == 0, 'NO','YES'))
Source: local data frame [10 x 7]
Groups: Ticketid
Ticketid First_Date Creation_Date Location Person time_diff Within_7
1 a1 2015-02-01 2015-02-01 A John 0 days NO
2 a1 2015-02-01 2015-02-03 D Manu 2 days YES
3 a1 2015-02-01 2015-02-11 C Mint 10 days NO
4 b1 2015-02-03 2015-02-03 B Jack 0 days NO
5 b1 2015-02-03 2015-02-11 B Jack 8 days NO
6 b1 2015-02-03 2015-02-14 F John 11 days NO
7 b1 2015-02-03 2015-02-27 E John 24 days NO
8 c1 2015-02-03 2015-02-03 C Mint 0 days NO
9 d1 2015-02-03 2015-02-03 A Somu 0 days NO
10 e1 2015-02-03 2015-02-03 A John 0 days NO