R 按日期列出的唯一值
我想获得一个包含两列的数据帧:1。不同的水果(无重复)2。特定水果(即猕猴桃)首次出现的日期R 按日期列出的唯一值,r,match,unique,R,Match,Unique,我想获得一个包含两列的数据帧:1。不同的水果(无重复)2。特定水果(即猕猴桃)首次出现的日期 水果我想这就是你想要的 fruits <- c("apples, oranges, pears, bananas", "pineapples, mangos, guavas", "bananas, apples, kiwis") fruits<-as.data.frame(fruits,stringsAsFactors=FALSE) #probably e
水果我想这就是你想要的
fruits <- c("apples, oranges, pears, bananas",
"pineapples, mangos, guavas",
"bananas, apples, kiwis")
fruits<-as.data.frame(fruits,stringsAsFactors=FALSE) #probably easier for the fruits to be strings rather than factors
fruits$date<-as.Date(c( "12.8.16", "22.4.17", "12.9.16"),format="%d.%m.%y") #and set your dates to be Dates rather than strings (otherwise they will be sorted alphabetically)
fruits[with(fruits, order(date)), ]
#need to convert your df to one-fruit-per-row
fruits2 <- do.call(rbind, #this binds together the data frames created by the lapply loop
lapply(1:nrow(fruits), #loops through the rows of fruits df to create a list of data frames, each corresponding to one row
function(i) data.frame(fruit=trimws(strsplit((fruits$fruits),",")[[i]]), #splits your strings at commas, and trims off the whitespace
date=fruits$date[i],stringsAsFactors = FALSE))) #adds the date corresponding to each row
#finding the first appearance is easily done using dplyr
library(dplyr)
fruits3 <- fruits2 %>% group_by(fruit) %>% summarise(firstdate=min(date))
以下是一些解决方案:
1)strsplit/unest/summary使用dplyr和tidyr。首先将date
列转换为“date”
类,并拆分水果
列,生成一列,其中每个单元格包含一个水果向量<代码>取消测试
并找到最小值:
library(dplyr)
library(tidyr)
fruits %>%
mutate(date = as.Date(date, "%d.%m.%y"),
fruits = strsplit(as.character(fruits), ", ")) %>%
unnest %>%
group_by(fruits) %>%
summarize(date = min(date)) %>%
ungroup
给予:
# A tibble: 8 × 2
fruits date
<chr> <date>
1 apples 2016-08-12
2 bananas 2016-08-12
3 guavas 2017-04-22
4 kiwis 2016-09-12
5 mangos 2017-04-22
6 oranges 2016-08-12
7 pears 2016-08-12
8 pineapples 2017-04-22
2)strsplit/stack/aggregate这不使用任何包。首先,我们拆分水果列,并用日期将结果列表的组件命名为L
。然后我们堆叠列表,创建一个数据框并重命名列,同时创建一个真正的“Date”
类列。最后,我们聚合以找到最小值
L <- with(fruits, setNames(strsplit(as.character(fruits), ", "), as.Date(date,"%d.%m.%y")))
stk <- with(stack(L), data.frame(fruits = values, date = as.Date(ind)))
aggregate(date ~ fruits, stk, min)
下面是一个使用splitstackshape包的方法,它使用下面的data.table包。我们可以使用cSplit()
在逗号处拆分水果
列,然后使用data.table语法获取最小的日期
library(splitstackshape)
## create the long data frame from the split 'fruits' column
DT <- cSplit(fruits, "fruits", sep = ",", direction = "long")
## convert the 'date' column to date class and take the minimum row
DT[, .(date = min(as.IDate(date, "%d.%m.%y"))), by = fruits]
# fruits date
# 1: apples 2016-08-12
# 2: oranges 2016-08-12
# 3: pears 2016-08-12
# 4: bananas 2016-08-12
# 5: pineapples 2017-04-22
# 6: mangos 2017-04-22
# 7: guavas 2017-04-22
# 8: kiwis 2016-09-12
库(splitstackshape)
##从拆分的“水果”列创建长数据帧
DT
fruits %>%
mutate(date = as.Date(date, "%d.%m.%y")) %>%
separate_rows(fruits) %>%
group_by(fruits) %>%
summarize(date = min(date)) %>%
ungroup
L <- with(fruits, setNames(strsplit(as.character(fruits), ", "), as.Date(date,"%d.%m.%y")))
stk <- with(stack(L), data.frame(fruits = values, date = as.Date(ind)))
aggregate(date ~ fruits, stk, min)
fruits date
1 apples 2016-08-12
2 bananas 2016-08-12
3 guavas 2017-04-22
4 kiwis 2016-09-12
5 mangos 2017-04-22
6 oranges 2016-08-12
7 pears 2016-08-12
8 pineapples 2017-04-22
library(splitstackshape)
## create the long data frame from the split 'fruits' column
DT <- cSplit(fruits, "fruits", sep = ",", direction = "long")
## convert the 'date' column to date class and take the minimum row
DT[, .(date = min(as.IDate(date, "%d.%m.%y"))), by = fruits]
# fruits date
# 1: apples 2016-08-12
# 2: oranges 2016-08-12
# 3: pears 2016-08-12
# 4: bananas 2016-08-12
# 5: pineapples 2017-04-22
# 6: mangos 2017-04-22
# 7: guavas 2017-04-22
# 8: kiwis 2016-09-12