如何在R中将字符列重塑为两列(日期和文本)?

如何在R中将字符列重塑为两列(日期和文本)?,r,dataframe,dplyr,reshape,R,Dataframe,Dplyr,Reshape,我有以下特点: cal = "\n \n21/01/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n21/01/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n0

我有以下特点:

cal = "\n \n21/01/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n21/01/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n \n03/02/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n17/02/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n11/03/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n11/03/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n \n24/03/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n25/03/2021\n\n        \nGeneral Council meeting of the ECB in Frankfurt\n        \n \n22/04/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n22/04/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n \n12/05/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n10/06/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in the Netherlands\n        \n \n10/06/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in the Netherlands\n        \n \n23/06/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n24/06/2021\n\n        \nGeneral Council meeting of the ECB in Frankfurt\n        \n \n22/07/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n22/07/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n \n09/09/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n09/09/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n \n22/09/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n23/09/2021\n\n        \nGeneral Council meeting of the ECB in Frankfurt\n        \n \n06/10/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n28/10/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n28/10/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n \n10/11/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n01/12/2021\n\n        \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n        \n \n02/12/2021\n\n        \nGeneral Council meeting of the ECB in Frankfurt\n        \n \n16/12/2021\n\n        \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n        \n \n16/12/2021\n\n        \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n        \n"
 cal = gsub( "\n", " ", calendar)


正如您可以在文本中看到的,有日期和文本。我想做的是将文本分成两列:“日期”和“事件”

这将是为简单起见显示的第一行的结果:

Date                    Event

21/01/2021        Governing Council of the ECB: monetary policy meeting in Frankfurt
21/01/2021        Press conference following the Governing Council meeting of the ECB...
03/02/2021        Governing Council of the ECB: non-monetary policy meeting in Frankfurt
17/02/2021        Governing Council of the ECB: non-monetary policy meeting in Frankfurt
11/03/2021        Governing Council of the ECB: monetary policy meeting in Frankfurt        
...
我尝试了许多函数来将语料库重塑为句子,以及提取日期,但我没有做到这一点。例如:

library(anytime)
anydate(str_extract_all(cal, "[[:alnum:]]+[ /]*\\d{2}[ /]*\\d{4}")[[1]]) %>% as.data.frame()

# it gives me back lot of NAs, I don't know why

[1] NA           NA           "2021-03-02" NA           "2021-11-03" "2021-11-03" NA          
 [8] NA           NA           NA           "2021-12-05" "2021-10-06" "2021-10-06" NA          
[15] NA           NA           NA           "2021-09-09" "2021-09-09" NA           NA          
[22] "2021-06-10" NA           NA           "2021-10-11" "2021-01-12" "2021-02-12" NA          
[29] NA          

有人能帮我吗

谢谢

使用read.table,我们可以在\n处拆分。strip.white=TRUE省略仅包含空格的元素。结果模式现在是日期-事件-日期。。。我们现在可以很好地将行转换成矩阵

使用read.table,我们可以在\n拆分。strip.white=TRUE省略仅包含空格的元素。结果模式现在是日期-事件-日期。。。我们现在可以很好地将行转换成矩阵

您可以使用str_match_all来提取遵循特定模式的数据

library(stringr)

tmp <- data.frame(str_match_all(trimws(gsub('\\s+', ' ', cal)), 
                  '(\\d+/\\d+/\\d+)\\s([A-Za-z:\\s-]+)')[[1]][, -1])
tmp$X2 <- trimws(tmp$X2)
tmp

#           X1                                                                                     X2
#1  21/01/2021                     Governing Council of the ECB: monetary policy meeting in Frankfurt
#2  21/01/2021       Press conference following the Governing Council meeting of the ECB in Frankfurt
#3  03/02/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#4  17/02/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#5  11/03/2021                     Governing Council of the ECB: monetary policy meeting in Frankfurt
#6  11/03/2021       Press conference following the Governing Council meeting of the ECB in Frankfurt
#7  24/03/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#...
#...
您可以使用str_match_all来提取遵循特定模式的数据

library(stringr)

tmp <- data.frame(str_match_all(trimws(gsub('\\s+', ' ', cal)), 
                  '(\\d+/\\d+/\\d+)\\s([A-Za-z:\\s-]+)')[[1]][, -1])
tmp$X2 <- trimws(tmp$X2)
tmp

#           X1                                                                                     X2
#1  21/01/2021                     Governing Council of the ECB: monetary policy meeting in Frankfurt
#2  21/01/2021       Press conference following the Governing Council meeting of the ECB in Frankfurt
#3  03/02/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#4  17/02/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#5  11/03/2021                     Governing Council of the ECB: monetary policy meeting in Frankfurt
#6  11/03/2021       Press conference following the Governing Council meeting of the ECB in Frankfurt
#7  24/03/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#...
#...

请添加此代码的说明。解释它的作用以及它是如何解决问题的。非常感谢,每个答案都很有效。我根据我最熟悉的代码选择了正确的答案。谢谢大家!请添加此代码的说明。解释它的作用以及它是如何解决问题的。非常感谢,每个答案都很有效。我根据我最熟悉的代码选择了正确的答案。谢谢大家!
library(dplyr)
library(stringr)

x = unlist(str_split(cal,"\n\\s{2,}\n\\s\n"))
y = data.frame(x, stringsAsFactors = FALSE)
y %>% separate(x,c("Date","Event"),"\n\n\\s{2,}\n") 
library(stringr)

tmp <- data.frame(str_match_all(trimws(gsub('\\s+', ' ', cal)), 
                  '(\\d+/\\d+/\\d+)\\s([A-Za-z:\\s-]+)')[[1]][, -1])
tmp$X2 <- trimws(tmp$X2)
tmp

#           X1                                                                                     X2
#1  21/01/2021                     Governing Council of the ECB: monetary policy meeting in Frankfurt
#2  21/01/2021       Press conference following the Governing Council meeting of the ECB in Frankfurt
#3  03/02/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#4  17/02/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#5  11/03/2021                     Governing Council of the ECB: monetary policy meeting in Frankfurt
#6  11/03/2021       Press conference following the Governing Council meeting of the ECB in Frankfurt
#7  24/03/2021                 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#...
#...