R 从列中提取国家名称(或其他实体)
我有一个data.frame,列位置中包含国家和城市,我想通过从librarymaps或任何其他国家名称集合中匹配world.cities$country.etc数据框来提取前者 考虑这个例子:R 从列中提取国家名称(或其他实体),r,dataframe,R,Dataframe,我有一个data.frame,列位置中包含国家和城市,我想通过从librarymaps或任何其他国家名称集合中匹配world.cities$country.etc数据框来提取前者 考虑这个例子: df <- data.frame(location = c("Aarup, Denmark", "Switzerland", "Estonia: Aaspere"),
df <- data.frame(location = c("Aarup, Denmark",
"Switzerland",
"Estonia: Aaspere"),
other_col = c(2,3,4))
但我并不成功;我期待这样的事情:
location other_col country rest_location
1 Aarup, Denmark 2 Denmark Aarup,
2 Switzerland 3 Switzerland
3 Estonia: Aaspere 4 Estonia : Aaspere
您可以尝试将此作为起点
library(tidyverse)
df %>%
rownames_to_column() %>%
separate_rows(location) %>%
mutate(gr = location %in% world.cities$country.etc) %>%
mutate(gr = ifelse(gr, "country", "rest_location")) %>%
spread(gr, location) %>%
right_join(df %>%
rownames_to_column(),
by = c("rowname", "other_col")) %>%
select(location, other_col, country, rest_location)
location other_col country rest_location
1 Aarup, Denmark 2 Denmark Aarup
2 Switzerland 3 Switzerland <NA>
3 Estonia: Aaspere 4 Estonia Aaspere
请注意,只有在“位置”列中只有两个单词时,此选项才有效。如有必要,您必须指定一个合适的单独选项,例如sep=,|:您可以尝试将其作为起点
library(tidyverse)
df %>%
rownames_to_column() %>%
separate_rows(location) %>%
mutate(gr = location %in% world.cities$country.etc) %>%
mutate(gr = ifelse(gr, "country", "rest_location")) %>%
spread(gr, location) %>%
right_join(df %>%
rownames_to_column(),
by = c("rowname", "other_col")) %>%
select(location, other_col, country, rest_location)
location other_col country rest_location
1 Aarup, Denmark 2 Denmark Aarup
2 Switzerland 3 Switzerland <NA>
3 Estonia: Aaspere 4 Estonia Aaspere
请注意,只有在“位置”列中只有两个单词时,此选项才有效。如有必要,您必须指定一个合适的单独名称,例如sep=,|::我们可以通过将所有国家名称粘贴在一起创建一个模式,并使用str_extract_all获取与位置模式匹配的所有国家名称,并删除与国家名称匹配的单词以获取剩余位置
使用sapply和toString表示国家,因为如果位置中有多个国家名称,它们都会连接在一个字符串中 我们可以通过将所有国家名称粘贴在一起来创建一个模式,并使用str_extract_all获取所有与该模式位置匹配的国家名称,删除与国家名称匹配的单词以获取剩余位置
使用sapply和toString表示国家,因为如果位置中有多个国家名称,它们都会连接在一个字符串中 基本R不包括地图包:
# Import the library:
library(maps)
# Split the string on the spaces:
country_city_vec <- strsplit(df$location, "\\s+")
# Replicate the other col's rows by the split string vec:
rolled_out_df <- data.frame(other_col = rep(df$other_col, sapply(country_city_vec, length)),
location = gsub("[[:punct:]]", "", unlist(country_city_vec)), stringsAsFactors = F)
# Match with the world df:
matched_with_world_df <- merge(df,
setNames(rolled_out_df[rolled_out_df$location %in% world.cities$country.etc,],
c("other_col", "country")),
by = "other_col", all.x = T)
# Extract the city/location drilldown:
matched_with_world_df$rest_location <- trimws(gsub("[[:punct:]]",
"",
gsub(paste0(matched_with_world_df$country,
collapse = "|"),
"", matched_with_world_df$location)), "both")
基本R不包括地图包:
# Import the library:
library(maps)
# Split the string on the spaces:
country_city_vec <- strsplit(df$location, "\\s+")
# Replicate the other col's rows by the split string vec:
rolled_out_df <- data.frame(other_col = rep(df$other_col, sapply(country_city_vec, length)),
location = gsub("[[:punct:]]", "", unlist(country_city_vec)), stringsAsFactors = F)
# Match with the world df:
matched_with_world_df <- merge(df,
setNames(rolled_out_df[rolled_out_df$location %in% world.cities$country.etc,],
c("other_col", "country")),
by = "other_col", all.x = T)
# Extract the city/location drilldown:
matched_with_world_df$rest_location <- trimws(gsub("[[:punct:]]",
"",
gsub(paste0(matched_with_world_df$country,
collapse = "|"),
"", matched_with_world_df$location)), "both")