R 在多个数据框中搜索包含的特定文本,并在新列中返回这些值(多次出现)
在另一个数据框的列(文本体)中搜索一个数据框中的多个特定单词,然后将这些值拉到一个新列中,是否需要一些帮助 进一步解释:R 在多个数据框中搜索包含的特定文本,并在新列中返回这些值(多次出现),r,search,match,tidyverse,grepl,R,Search,Match,Tidyverse,Grepl,在另一个数据框的列(文本体)中搜索一个数据框中的多个特定单词,然后将这些值拉到一个新列中,是否需要一些帮助 进一步解释: 首先,我有一个数据框架,包括14个国家的大量文本摘要列表 其次,我有第二个数据框架,其中包含所有行政级别(lvl_2)的名称,如省、村等 基本上,我想从大型摘要中提取提及这些特定adm2省/村名称的内容,并创建一个新的列,其中每个词都以较长的轴为中心 下面是一些示例数据,您可以使用它们来重新创建我的问题,其中包含两个数据框:(1)test\u admin用于我要搜索的管理
- 首先,我有一个数据框架,包括14个国家的大量文本摘要列表
- 其次,我有第二个数据框架,其中包含所有行政级别(lvl_2)的名称,如省、村等
- 基本上,我想从大型摘要中提取提及这些特定adm2省/村名称的内容,并创建一个新的列,其中每个词都以较长的轴为中心
test\u admin
用于我要搜索的管理级别列表,以及(2)test\u dataset$Summary
,这是我要运行搜索的列。(您可以忽略其他_变量的值,这些变量在真实数据集中填充了大量值)
我尝试了一些mutate和grepl函数,但没有成功。我发现的其他示例似乎只针对精确值或单个搜索进行了这项工作。谢谢你的帮助
#首选tidyverse解决方案这里有一种方法:
library(tidyverse)
map_df(seq(nrow(test_dataset)), function(i) {
inds <- str_detect(test_dataset$Summary[i], test_admin$adm1_name) |
str_detect(test_dataset$Summary[i], test_admin$adm2_name)
if(any(inds)) tibble(test_dataset[i, ], Locations = test_admin$adm2_name[inds])
else tibble(test_dataset[i, ], Locations = NA)
})
# Summary Other_Variable_1 Other_Variable_2 Locations
# <chr> <int> <int> <chr>
# 1 In Cox's Bazar, this and that happened. 1 1 NA
# 2 In Yangon, something else happened 2 2 NA
# 3 In Central Karachi, this happened 3 3 Central Karachi
# 4 In Sindh, this happened 4 4 Central Karachi
# 5 In Sindh, this happened 4 4 Dadu
# 6 In Sindh, this happened 4 4 East Karachi
# 7 In Sindh, this happened 4 4 Ghotki
# 8 In Sindh, this happened 4 4 Sujawal
# 9 In Sindh, this happened 4 4 Sukkur
#10 In Dadu AND East Karachi, this happened 5 5 Dadu
#11 In Dadu AND East Karachi, this happened 5 5 East Karachi
库(tidyverse)
map_df(序列(nrow(测试数据集)),函数(i){
非常好,非常感谢!它工作得非常好。
Summary Other_Variable_1 Other_Variable_2 Locations
1 In Cox's Bazar, this and that happened. 1 1 <NA>
2 In Yangon, something else happened 2 2 <NA>
3 In Central Karachi, this happened 3 3 Central Karachi
4 In Sindh, this happened 4 4 Central Karachi
5 In Sindh, this happened 4 4 Dadu
6 In Sindh, this happened 4 4 East Karachi
7 In Sindh, this happened 4 4 Ghotki
8 In Sindh, this happened 4 4 Sujawal
9 In Sindh, this happened 4 4 Sukkur
10 In Dadu AND East Karachi, this happened 5 5 Dadu
11 In Dadu AND East Karachi, this happened 5 5 East Karachi
library(tidyverse)
map_df(seq(nrow(test_dataset)), function(i) {
inds <- str_detect(test_dataset$Summary[i], test_admin$adm1_name) |
str_detect(test_dataset$Summary[i], test_admin$adm2_name)
if(any(inds)) tibble(test_dataset[i, ], Locations = test_admin$adm2_name[inds])
else tibble(test_dataset[i, ], Locations = NA)
})
# Summary Other_Variable_1 Other_Variable_2 Locations
# <chr> <int> <int> <chr>
# 1 In Cox's Bazar, this and that happened. 1 1 NA
# 2 In Yangon, something else happened 2 2 NA
# 3 In Central Karachi, this happened 3 3 Central Karachi
# 4 In Sindh, this happened 4 4 Central Karachi
# 5 In Sindh, this happened 4 4 Dadu
# 6 In Sindh, this happened 4 4 East Karachi
# 7 In Sindh, this happened 4 4 Ghotki
# 8 In Sindh, this happened 4 4 Sujawal
# 9 In Sindh, this happened 4 4 Sukkur
#10 In Dadu AND East Karachi, this happened 5 5 Dadu
#11 In Dadu AND East Karachi, this happened 5 5 East Karachi