将正则表达式与向量相结合用于R中的文本替换_R_Regex_Vectorization_Sapply_Grepl

将正则表达式与向量相结合用于R中的文本替换

r regex

将正则表达式与向量相结合用于R中的文本替换,r,regex,vectorization,sapply,grepl,R,Regex,Vectorization,Sapply,Grepl,我有一个数据框，其中有一列曲目标题、艺术家和我从Spotify中搜集的音乐类型。在R中，再现性： a <- c("Run the World (Girls)", "LOCO", "Habits", "Never Born - 2017 Version") b <- c("Beyoncé", "NERVO", "Marmozets", "Guano Apes") c <- c("dance pop pop post-teen pop r&b", "australian d

我有一个数据框，其中有一列曲目标题、艺术家和我从Spotify中搜集的音乐类型。在R中，再现性：

a <- c("Run the World (Girls)", "LOCO", "Habits", "Never Born - 2017 Version")
b <- c("Beyoncé", "NERVO", "Marmozets", "Guano Apes")
c <- c("dance pop pop post-teen pop r&b", "australian dance big room deep big room edm electro house house progressive electro house progressive house", "alt-indie rock british alternative rock pixie", "alternative metal funk metal nu metal post-grunge rap metal rap rock")
df <- data.frame(SONG=a, ARTIST=b, GENRE=c)

向量匹配是有效的，但我不确定如何将其扩展到

gsub

，使用一个潜在替换向量，它本身将充当被替换的对象。

还没有看到这与向量，所以请原谅，如果这是一个转载。提前感谢。

以表格形式显示输入和输出可能会有所帮助。“我不完全明白你的逻辑。”TimBiegeleisen，我为澄清问题道歉。我添加的图片是否有帮助，或者我是否可以提供/澄清其他信息？这绝对是正确的！我甚至都没有安装tidytext，所以在一百万年内我都不会想到这一点。我将来会做文本挖掘，所以我很高兴你能带领我朝这个方向发展，尽管目前我还不知道这个命令是如何工作的。谢谢你的时间和帮助！哦，你只能使用tidyverse<代码>转换（df，GENRE=regmatches（GENRE，gregexpr（粘贴（main_genres，collapse=“|”））%%>%tidyr:：unnest（）%%>%unique（）甚至

df%>%dplyr:：mutate（GENRE=stringr:：str_extract_all（GENRE，paste（main_genres，collapse=“|”）））%%>%tidyr:：unnest（）%%unique（）

等

df%>%
     tidytext::unnest_tokens(GENRE,GENRE,stringr::str_extract_all,pattern=glue::collapse(main_genres,"|"))%>%
     unique%>%
     `rownames<-`(NULL)
                       SONG     ARTIST GENRE
1     Run the World (Girls)    Beyoncé dance
2     Run the World (Girls)    Beyoncé   pop
3                      LOCO      NERVO dance
4                    Habits  Marmozets indie
5                    Habits  Marmozets  rock
6 Never Born - 2017 Version Guano Apes metal
7 Never Born - 2017 Version Guano Apes  rock

all_main_genres <- data.frame(TRACK = character(), ARTIST = character(), GENRE = character())

sapply(main_genres, grepl, playlist_genres$GENRE[row], ignore.case = TRUE)

df%>%
     tidytext::unnest_tokens(GENRE,GENRE,stringr::str_extract_all,pattern=glue::collapse(main_genres,"|"))%>%
     unique%>%
     `rownames<-`(NULL)
                       SONG     ARTIST GENRE
1     Run the World (Girls)    Beyoncé dance
2     Run the World (Girls)    Beyoncé   pop
3                      LOCO      NERVO dance
4                    Habits  Marmozets indie
5                    Habits  Marmozets  rock
6 Never Born - 2017 Version Guano Apes metal
7 Never Born - 2017 Version Guano Apes  rock

GENRE=regmatches(df$GENRE,gregexpr(paste(main_genres,collapse = "|"),df$GENRE))
unique(transform(df[rep(1:nrow(df),lengths(GENRE)),1:2],GENRE=unlist(GENRE),row.names=NULL))
                        SONG     ARTIST GENRE
1      Run the World (Girls)    Beyoncé dance
2      Run the World (Girls)    Beyoncé   pop
5                       LOCO      NERVO dance
6                     Habits  Marmozets indie
7                     Habits  Marmozets  rock
9  Never Born - 2017 Version Guano Apes metal
13 Never Born - 2017 Version Guano Apes  rock