如何在R中将文本列表转换为数据帧

如何在R中将文本列表转换为数据帧,r,string,R,String,我有一个csv文件有点混乱,比如: - Page 1 - Hugh - Grant - First Name: - Last Name: - Age: 54 - Movies: - Notting Hill - 4 Weddings & A Funeral - Music and Lyrics - Scarlett - Johansson - First Name: - Last Name: - Age: 29 - Movies: - The Avengers - Chef - Lucy

我有一个csv文件有点混乱,比如:

- Page 1
- Hugh
- Grant
- First Name:
- Last Name:
- Age: 54
- Movies:
- Notting Hill
- 4 Weddings & A Funeral
- Music and Lyrics
- Scarlett
- Johansson
- First Name:
- Last Name:
- Age: 29
- Movies:
- The Avengers
- Chef
- Lucy
- Page 2
- Mark
- Wahlberg
- First Name:
- Last Name:
- Age: 43
- Movies:
- Ted
- Transformers: Age of Extinction
我想把它做成如下表:

- First Name Last Name Age Movies
- Hugh       Grant     54  Notting Hill, 4 Weddings & a Funeral, Music & Lyric
- Scarlett   Johansson 29  The Avengers, Chef, Lucy
- Mark       Wahlberg  43  Ted, Transformers: Age of Extinction

如何在R中创建这样的数据帧?请注意,原始列表的长度约为16000(即16000×1数据帧)。

根据显示的数据,您可以尝试

 lines <- readLines("movies.txt")
 lines1 <- lines[!grepl("Page", lines)]
 indx <- grep("(First|Last|Age|Movies).*:$", lines1)
 indx1 <- grep("First.*:", lines1)
 indx2 <- grep("Movies:", lines1)

 m1 <- t(sapply(c(-1,-2, 3), function(i) lines1[indx1+i]))[c(2,1,3),]
 m2 <-  t(sapply(1:3, function(i) lines1[indx2+i]))
 m3 <- rbind(m1, m2)
 library(stringr)
 dat <- data.frame(names= c(unique(lines1[indx]),rep('-', 2)),
        matrix(str_trim(gsub("-","",  m3)), nrow=6), stringsAsFactors=FALSE)

 dat
 #         names                     X1           X2
 #1 - First Name:                   Hugh     Scarlett
 #2  - Last Name:                  Grant    Johansson
 #3        - Age:                     54           29
 #4     - Movies:           Notting Hill The Avengers
 #5             - 4 Weddings & A Funeral         Chef
 #6             -       Music and Lyrics         Lucy
  #                              X3
 #1                            Mark
 #2                        Wahlberg
 #3                              43
 #4                             Ted
 #5 Transformers: Age of Extinction
 #6                            <NA>

行“有点”混乱?您说这是一个CSV文件,但您的输入没有显示分隔符。谢谢!帮了大忙@没问题。很高兴这有帮助。