如何在R中将文本列表转换为数据帧
我有一个csv文件有点混乱,比如:如何在R中将文本列表转换为数据帧,r,string,R,String,我有一个csv文件有点混乱,比如: - Page 1 - Hugh - Grant - First Name: - Last Name: - Age: 54 - Movies: - Notting Hill - 4 Weddings & A Funeral - Music and Lyrics - Scarlett - Johansson - First Name: - Last Name: - Age: 29 - Movies: - The Avengers - Chef - Lucy
- Page 1
- Hugh
- Grant
- First Name:
- Last Name:
- Age: 54
- Movies:
- Notting Hill
- 4 Weddings & A Funeral
- Music and Lyrics
- Scarlett
- Johansson
- First Name:
- Last Name:
- Age: 29
- Movies:
- The Avengers
- Chef
- Lucy
- Page 2
- Mark
- Wahlberg
- First Name:
- Last Name:
- Age: 43
- Movies:
- Ted
- Transformers: Age of Extinction
我想把它做成如下表:
- First Name Last Name Age Movies
- Hugh Grant 54 Notting Hill, 4 Weddings & a Funeral, Music & Lyric
- Scarlett Johansson 29 The Avengers, Chef, Lucy
- Mark Wahlberg 43 Ted, Transformers: Age of Extinction
如何在R中创建这样的数据帧?请注意,原始列表的长度约为16000(即16000×1数据帧)。根据显示的数据,您可以尝试
lines <- readLines("movies.txt")
lines1 <- lines[!grepl("Page", lines)]
indx <- grep("(First|Last|Age|Movies).*:$", lines1)
indx1 <- grep("First.*:", lines1)
indx2 <- grep("Movies:", lines1)
m1 <- t(sapply(c(-1,-2, 3), function(i) lines1[indx1+i]))[c(2,1,3),]
m2 <- t(sapply(1:3, function(i) lines1[indx2+i]))
m3 <- rbind(m1, m2)
library(stringr)
dat <- data.frame(names= c(unique(lines1[indx]),rep('-', 2)),
matrix(str_trim(gsub("-","", m3)), nrow=6), stringsAsFactors=FALSE)
dat
# names X1 X2
#1 - First Name: Hugh Scarlett
#2 - Last Name: Grant Johansson
#3 - Age: 54 29
#4 - Movies: Notting Hill The Avengers
#5 - 4 Weddings & A Funeral Chef
#6 - Music and Lyrics Lucy
# X3
#1 Mark
#2 Wahlberg
#3 43
#4 Ted
#5 Transformers: Age of Extinction
#6 <NA>
行“有点”混乱?您说这是一个CSV文件,但您的输入没有显示分隔符。谢谢!帮了大忙@没问题。很高兴这有帮助。