从R中的字符中提取持续时间_R_Date_Character

从R中的字符中提取持续时间

r date

从R中的字符中提取持续时间,r,date,character,R,Date,Character,我目前面临一个需要分析的数据集问题。以下是这些数据的示例： session_id individ_id colony species year_tracked 1 12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar 2009_10 2 12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar 2010_11 3 1214

我目前面临一个需要分析的数据集问题。以下是这些数据的示例：

      session_id    individ_id  colony     species           year_tracked
1 12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar      2009_10
2 12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar      2010_11
3 12143_2009-07-01 GBT_FC14766 Eynhallow Northern fulmar      2009_10
4 12143_2010-07-18 GBT_FR77883 Eynhallow Northern fulmar      2010_12
5 12144_2009-07-01 GBT_FP05030 Eynhallow Northern fulmar      2009_10
6 12145_2009-07-01 GBT_FA82356 Eynhallow Northern fulmar      2009_10

我需要创建一个新列，其中包含跟踪的年数，在这种情况下，该列将为：

2010-2009 --> 1
2011-2010 --> 1
2010-2009 --> 1
2012-2010 --> 2
2010-2009 --> 1
2010-2009 --> 1

year\u tracked

列是一个

字符

类。也许一个函数可以将单元格的前4个字符和后2个字符转换为日期，但我不知道如何做到这一点。

这里有一点正则表达式：

首先用四个数字提取第一年的

stru extract（，“[0-9]{4}”）

，然后提取第二年的

stru extract（，”（？一个带有分隔的选项

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
    mutate(year_tracked2 = str_replace(year_tracked, "_", "_20")) %>% 
    separate(year_tracked2, into = c('year1', 'year2'), convert = TRUE) %>%
    mutate(n = year2 - year1) %>%
    select(-year1, -year2)
#       session_id  individ_id    colony         species year_tracked n
#1 12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar      2009_10 1
#2 12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar      2010_11 1
#3 12143_2009-07-01 GBT_FC14766 Eynhallow Northern fulmar      2009_10 1
#4 12143_2010-07-18 GBT_FR77883 Eynhallow Northern fulmar      2010_12 2
#5 12144_2009-07-01 GBT_FP05030 Eynhallow Northern fulmar      2009_10 1
#6 12145_2009-07-01 GBT_FA82356 Eynhallow Northern fulmar      2009_10 1


或者更简单的选择是用:20
替换\uu
，只需进行评估
即可
library(purrr)
df1 %>% 
   mutate(n = lengths(map(str_replace(year_tracked, "_", ":20"),
           ~ eval(parse(text = .x))))- 1)

数据
df1能否显示预期输出的逻辑刚刚更新；）
library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
    mutate(year_tracked2 = str_replace(year_tracked, "_", "_20")) %>% 
    separate(year_tracked2, into = c('year1', 'year2'), convert = TRUE) %>%
    mutate(n = year2 - year1) %>%
    select(-year1, -year2)
#       session_id  individ_id    colony         species year_tracked n
#1 12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar      2009_10 1
#2 12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar      2010_11 1
#3 12143_2009-07-01 GBT_FC14766 Eynhallow Northern fulmar      2009_10 1
#4 12143_2010-07-18 GBT_FR77883 Eynhallow Northern fulmar      2010_12 2
#5 12144_2009-07-01 GBT_FP05030 Eynhallow Northern fulmar      2009_10 1
#6 12145_2009-07-01 GBT_FA82356 Eynhallow Northern fulmar      2009_10 1

library(purrr)
df1 %>% 
   mutate(n = lengths(map(str_replace(year_tracked, "_", ":20"),
           ~ eval(parse(text = .x))))- 1)

df1 <- structure(list(session_id = c("12141_2009-07-01", "12141_2010-07-18", 
"12143_2009-07-01", "12143_2010-07-18", "12144_2009-07-01", "12145_2009-07-01"
), individ_id = c("GBT_FP96194", "GBT_FP96235", "GBT_FC14766", 
"GBT_FR77883", "GBT_FP05030", "GBT_FA82356"), colony = c("Eynhallow", 
"Eynhallow", "Eynhallow", "Eynhallow", "Eynhallow", "Eynhallow"
), species = c("Northern fulmar", "Northern fulmar", "Northern fulmar", 
"Northern fulmar", "Northern fulmar", "Northern fulmar"), year_tracked = c("2009_10", 
"2010_11", "2009_10", "2010_12", "2009_10", "2009_10")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))