R 从文本中删除地理位置_R - Fatal编程技术网

R 从文本中删除地理位置

R 从文本中删除地理位置,r,R,是否有任何可能的方法可以在没有列表的情况下从文本中删除国家名称和城市示例数据： data.frame(text = c("I love to travel to London"), stringAsFactors = FALSE) 我的建议是使用如下列表： library(maps) head(world.cities) 这是一个快速的尝试，以完成你所寻找的 df <- data.frame(text = c("I love to travel to London",

是否有任何可能的方法可以在没有列表的情况下从文本中删除

国家

名称和

城市

示例数据：

data.frame(text = c("I love to travel to London"), stringAsFactors = FALSE)

我的建议是使用如下列表：

library(maps)
head(world.cities)

这是一个快速的尝试，以完成你所寻找的

df <- data.frame(text = c("I love to travel to London",
                          "Germany was a fun country to visit."), stringAsFactors = FALSE)

replace_cities_countries <- function(string, replacement) {
  library(maps)
  data(world.cities)
  patterns <- unique(c(world.cities$name, world.cities$country.etc))
  for (i in seq_along(patterns))
    string <- gsub(patterns[i], replacement, string, perl=TRUE)
  string
}

sapply(df$text, replace_cities_countries, replacement='HOORAY!')

df如果没有查找列表，我想不出一种100%准确的方法。下面是列表的一种混乱方法：如果您的数据格式正确，例如London的大写字母为“L”，您可以获取所有此类事件，并将其与反向地理定位API或地址验证API进行比较，以查看是否得到结果。只要你得到一个非空的结果，它就是一个城市/国家，你可以从你的文本中删除它。例如，“伦敦”可能会返回多个结果-伦敦安大略省、伦敦英国等。尽管有些API（如谷歌）会试图找到与它认为你所要求的“最接近的匹配项”，但要小心，这显然不起作用。如果你不想硬编码，列表肯定会容易得多，您可以从在线源代码中获取它—以编程方式进行，以便每次脚本运行时，它都能获得最新的数据。
df <- data.frame(text = c("I love to travel to London",
                          "Germany was a fun country to visit."), stringAsFactors = FALSE)

replace_cities_countries <- function(string, replacement) {
  library(maps)
  data(world.cities)
  patterns <- unique(c(world.cities$name, world.cities$country.etc))
  for (i in seq_along(patterns))
    string <- gsub(patterns[i], replacement, string, perl=TRUE)
  string
}

sapply(df$text, replace_cities_countries, replacement='HOORAY!')