Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将正则表达式与向量相结合用于R中的文本替换_R_Regex_Vectorization_Sapply_Grepl - Fatal编程技术网

将正则表达式与向量相结合用于R中的文本替换

将正则表达式与向量相结合用于R中的文本替换,r,regex,vectorization,sapply,grepl,R,Regex,Vectorization,Sapply,Grepl,我有一个数据框,其中有一列曲目标题、艺术家和我从Spotify中搜集的音乐类型。在R中,再现性: a <- c("Run the World (Girls)", "LOCO", "Habits", "Never Born - 2017 Version") b <- c("Beyoncé", "NERVO", "Marmozets", "Guano Apes") c <- c("dance pop pop post-teen pop r&b", "australian d

我有一个数据框,其中有一列曲目标题、艺术家和我从Spotify中搜集的音乐类型。在R中,再现性:

a <- c("Run the World (Girls)", "LOCO", "Habits", "Never Born - 2017 Version")
b <- c("Beyoncé", "NERVO", "Marmozets", "Guano Apes")
c <- c("dance pop pop post-teen pop r&b", "australian dance big room deep big room edm electro house house progressive electro house progressive house", "alt-indie rock british alternative rock pixie", "alternative metal funk metal nu metal post-grunge rap metal rap rock")
df <- data.frame(SONG=a, ARTIST=b, GENRE=c)
向量匹配是有效的,但我不确定如何将其扩展到
gsub
,使用一个潜在替换向量,它本身将充当被替换的对象。
还没有看到这与向量,所以请原谅,如果这是一个转载。提前感谢。

以表格形式显示输入和输出可能会有所帮助。“我不完全明白你的逻辑。”TimBiegeleisen,我为澄清问题道歉。我添加的图片是否有帮助,或者我是否可以提供/澄清其他信息?这绝对是正确的!我甚至都没有安装tidytext,所以在一百万年内我都不会想到这一点。我将来会做文本挖掘,所以我很高兴你能带领我朝这个方向发展,尽管目前我还不知道这个命令是如何工作的。谢谢你的时间和帮助!哦,你只能使用tidyverse<代码>转换(df,GENRE=regmatches(GENRE,gregexpr(粘贴(main_genres,collapse=“|”))%%>%tidyr::unnest()%%>%unique()甚至
df%>%dplyr::mutate(GENRE=stringr::str_extract_all(GENRE,paste(main_genres,collapse=“|”)))%%>%tidyr::unnest()%%unique()
df%>%
     tidytext::unnest_tokens(GENRE,GENRE,stringr::str_extract_all,pattern=glue::collapse(main_genres,"|"))%>%
     unique%>%
     `rownames<-`(NULL)
                       SONG     ARTIST GENRE
1     Run the World (Girls)    Beyoncé dance
2     Run the World (Girls)    Beyoncé   pop
3                      LOCO      NERVO dance
4                    Habits  Marmozets indie
5                    Habits  Marmozets  rock
6 Never Born - 2017 Version Guano Apes metal
7 Never Born - 2017 Version Guano Apes  rock
all_main_genres <- data.frame(TRACK = character(), ARTIST = character(), GENRE = character())
sapply(main_genres, grepl, playlist_genres$GENRE[row], ignore.case = TRUE)
df%>%
     tidytext::unnest_tokens(GENRE,GENRE,stringr::str_extract_all,pattern=glue::collapse(main_genres,"|"))%>%
     unique%>%
     `rownames<-`(NULL)
                       SONG     ARTIST GENRE
1     Run the World (Girls)    Beyoncé dance
2     Run the World (Girls)    Beyoncé   pop
3                      LOCO      NERVO dance
4                    Habits  Marmozets indie
5                    Habits  Marmozets  rock
6 Never Born - 2017 Version Guano Apes metal
7 Never Born - 2017 Version Guano Apes  rock
GENRE=regmatches(df$GENRE,gregexpr(paste(main_genres,collapse = "|"),df$GENRE))
unique(transform(df[rep(1:nrow(df),lengths(GENRE)),1:2],GENRE=unlist(GENRE),row.names=NULL))
                        SONG     ARTIST GENRE
1      Run the World (Girls)    Beyoncé dance
2      Run the World (Girls)    Beyoncé   pop
5                       LOCO      NERVO dance
6                     Habits  Marmozets indie
7                     Habits  Marmozets  rock
9  Never Born - 2017 Version Guano Apes metal
13 Never Born - 2017 Version Guano Apes  rock