R 在MovieLens-1M数据中分隔符[::]处拆分列

R 在MovieLens-1M数据中分隔符[::]处拆分列,r,R,我是R编程的新手,不幸的是我必须处理movieLens-1M数据。在这里,我想问一下如何在movies.dat中拆分分隔符[::]处的列。我尝试了以下代码: > moviesDF<-read.delim("movies.dat", sep="|", header=F, stringsAsFactors=FALSE) > str(moviesDF) 'data.frame': 3998 obs. of 3 variables: $ V1: chr "1::Toy Sto

我是R编程的新手,不幸的是我必须处理movieLens-1M数据。在这里,我想问一下如何在movies.dat中拆分分隔符[::]处的列。我尝试了以下代码:

> moviesDF<-read.delim("movies.dat", sep="|", header=F, stringsAsFactors=FALSE)
> str(moviesDF)
'data.frame':   3998 obs. of  3 variables:
 $ V1: chr  "1::Toy Story (1995)::Animation" "2::Jumanji (1995)::Adventure" "3::Grumpier Old Men (1995)::Comedy" "4::Waiting to Exhale (1995)::Comedy" ...
 $ V2: chr  "Children's" "Children's" "Romance" "Drama" ...
 $ V3: chr  "Comedy" "Fantasy" "" "" ...

另外,我的目标是提供推荐系统

您可以从我的“splitstackshape”软件包中尝试
cSplit
。用途如下:

library(splitstackshape)
cSplit(moviesDF, "V1", "::")
#            V2      V3 V1_1                     V1_2      V1_3
# 1: Children's  Comedy    1         Toy Story (1995) Animation
# 2: Children's Fantasy    2           Jumanji (1995) Adventure
# 3:    Romance            3  Grumpier Old Men (1995)    Comedy
# 4:      Drama            4 Waiting to Exhale (1995)    Comedy

问题出在导入函数中
read.delim(sep=“|”)未正确读取数据集,因为
|
仅限定V3中所需的差异值。您应该使用
readLines
导入数据集

moviesDF <- readLines("movies.dat")
moviesDF <- as.data.frame(do.call("rbind",strsplit(moviesDF,"::")),stringsAsFactors = FALSE)
names(moviesDF) <- c("V1","V2","V3")

moviesDF这里是开始
unlist(strsplit(“1::Toy Story(1995)::Animation)”,“::”)
,另请参见
moviesDF <- readLines("movies.dat")
moviesDF <- as.data.frame(do.call("rbind",strsplit(moviesDF,"::")),stringsAsFactors = FALSE)
names(moviesDF) <- c("V1","V2","V3")