从R中的文本中提取评级和相应日期
我想从数据框中提取每家银行的评级及其日期。此外,将单独的评级记录添加到新行,并将评级和日期分为两列 这是我的数据样本:从R中的文本中提取评级和相应日期,r,regex,R,Regex,我想从数据框中提取每家银行的评级及其日期。此外,将单独的评级记录添加到新行,并将评级和日期分为两列 这是我的数据样本: mydf <- data.frame("bank_name"=c("Bank A","Bank B"), "records"=c("Rating: B-\nRating Range: Jun-08-2017 to Present\n\nRating: B\nRating Range: Jan-23-2013 to Jun-08-2017","Rating: BBB-\nR
mydf <- data.frame("bank_name"=c("Bank A","Bank B"), "records"=c("Rating: B-\nRating Range: Jun-08-2017 to Present\n\nRating: B\nRating Range: Jan-23-2013 to Jun-08-2017","Rating: BBB-\nRating Range: Oct-02-2018 to Present\n\nRating: B\nRating Range: Apr-06-2018 to Oct-02-2018\n\nRating: A\nRating Range: Jun-08-2007 to Jan-31-2008\n\nRating: CCC\nRating Range: Jan-23-2006 to Aug-08-2007"))
mydf一个选项是使用str\u extract\u all
将“记录”列中“评级”、“评级范围”后的字符提取到列表中,并unest
将列表中的元素提取出来
library(tidyverse)
mydf %>%
mutate(ratings = str_extract_all(records, "(?<=Rating: )[A-E-]+"),
date = str_extract_all(records,
"(?<=Rating Range: )[A-Z][a-z]{2}-\\d{2}-\\d{4}")) %>%
select(-records) %>%
unnest
# bank_name ratings date
#1 Bank A B- Jun-08-2017
#2 Bank A B Jan-23-2013
#3 Bank B BBB- Oct-02-2018
#4 Bank B B Apr-06-2018
#5 Bank B A Jun-08-2007
#6 Bank B CCC Jan-23-2006
库(tidyverse)
mydf%>%
mutate(ratings=str_extract_all(records),(?它有效!需要提到的一点是,我在[a-E-]中添加了一个“+”登录[a-E-]作为可能的+登录评级。(在我的示例数据中没有a+,但评级可以在末尾包含+)。我不确定它是否是添加+,但它有效!@OllieMa你可以在[a-E-+]中添加它+
Cool!非常感谢您的帮助@akrun
library(tidyverse)
mydf %>%
mutate(ratings = str_extract_all(records, "(?<=Rating: )[A-E-]+"),
date = str_extract_all(records,
"(?<=Rating Range: )[A-Z][a-z]{2}-\\d{2}-\\d{4}")) %>%
select(-records) %>%
unnest
# bank_name ratings date
#1 Bank A B- Jun-08-2017
#2 Bank A B Jan-23-2013
#3 Bank B BBB- Oct-02-2018
#4 Bank B B Apr-06-2018
#5 Bank B A Jun-08-2007
#6 Bank B CCC Jan-23-2006