Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 从列表中提取某些字符并将其转换为字符向量_R_List_Dplyr_Stringr - Fatal编程技术网

R 从列表中提取某些字符并将其转换为字符向量

R 从列表中提取某些字符并将其转换为字符向量,r,list,dplyr,stringr,R,List,Dplyr,Stringr,我的数据框中有一列是字符列表。这是列类别 str(df) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables: $ categories:List of 4 ..$ : chr "Tex-Mex" "Mexican" "Fast Food" "Restaurants" ..$ : chr "Hawaiian" "Restaurants" "Barbeque" ..$ : chr "Restaur

我的数据框中有一列是字符列表。这是列
类别

str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   4 obs. of  3 variables:
 $ categories:List of 4
  ..$ : chr  "Tex-Mex" "Mexican" "Fast Food" "Restaurants"
  ..$ : chr  "Hawaiian" "Restaurants" "Barbeque"
  ..$ : chr  "Restaurants" "Italian" "Seafood"
  ..$ : chr  "Restaurants" "Mexican" "American (Traditional)"
 $ name      : chr  "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
 $ type      : chr  "business" "business" "business" "business"
以下是前四行的
dput

structure(list(categories = list(c("Tex-Mex", "Mexican", "Fast Food", 
"Restaurants"), c("Hawaiian", "Restaurants", "Barbeque"), c("Restaurants", 
"Italian", "Seafood"), c("Restaurants", "Mexican", "American (Traditional)"
)), name = c("Taco Bell", "Ohana Hawaiian BBQ", "Carrabba's Italian Grill", 
"Don Tequila"), type = c("business", "business", "business", 
"business")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"), .Names = c("categories", "name", "type"))
我想从这个列表中提取一些值,这样这些值就只剩下那个向量了

例如,我想过滤掉所有不是“墨西哥”和“餐馆”的值。所以剩下的价值观只有“墨西哥”和“餐馆”。为此,我尝试了以下解决方案:

df_test <- df %>% unnest(categories) %>% 
          filter(str_detect(categories, "Mexican")
                (str_detect(categories, "Restaurants")) %>% 
          nest(categories)
问题是,在此之后,该列不再像
type
列那样是字符向量

是否有可能过滤掉这些字符,以便在执行该步骤后,该列成为正常的字符向量,如
名称
类型
列? 我不想替换通过此过程删除的值/行。因此,如果某一行中没有“墨西哥”或“餐馆”,则该行将被删除

使用过的软件包:
dplyr

stringr
使用
lappy
将列表子集

lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])

[[1]]
[1] "Mexican"     "Restaurants"

[[2]]
[1] "Restaurants"

[[3]]
[1] "Restaurants"

[[4]]
[1] "Restaurants" "Mexican"
将没有匹配条件的行添加到筛选行

df1 <- rbind(df1, c(list("Nothing to match"), "drop me", "business"))
df1$categories <- lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])
df1[sapply(df1$categories, length) > 0, ]

df1请包括您正在使用的所有软件包……不仅是
dplyr
,而且您可能会更好地添加一个更完整的数据集(一个或两个附加变量),并为该数据集提供所需的结果。例如,iIt不清楚是否要删除此列中不包含“mexican”的所有行,或者是否要用NA替换该值。为什么要在此处使用
stru detect
?您只需执行
df%>%unest()%%>%filter(categories=='Mexican')
我想筛选出多个值。如果我这样做,一些行会相乘。所以只需执行
df%>%unest()%%>%filter(类别%in%c('Mexican','Restaurants'))
THX,它可以工作,但向量仍然是一个列表,而不是一个字符。@Banjo是否将所有剩余的条目粘贴到一个字符串中?是,字符应该是一个向量。@Banjo添加了将列表折叠成单个字符串的示例
df1 <- rbind(df1, c(list("Nothing to match"), "drop me", "business"))
df1$categories <- lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])
df1[sapply(df1$categories, length) > 0, ]
df1$categories <- sapply(df1$categories, function(x) paste(sort(x[x %in% c("Mexican", "Restaurants")]), collapse=" "))
df1[nchar(df1$categories) > 0, ]

# A tibble: 4 x 3
           categories                     name     type
                <chr>                    <chr>    <chr>
1 Mexican Restaurants                Taco Bell business
2         Restaurants       Ohana Hawaiian BBQ business
3         Restaurants Carrabba's Italian Grill business
4 Mexican Restaurants              Don Tequila business